diff --git a/docs/superpowers/plans/2026-03-26-structural-refactoring.md b/docs/superpowers/plans/2026-03-26-structural-refactoring.md new file mode 100644 index 0000000..a2d25b8 --- /dev/null +++ b/docs/superpowers/plans/2026-03-26-structural-refactoring.md @@ -0,0 +1,438 @@ +# Structural Refactoring — Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** 5 structural refactoring items from the code audit: decompose synthesis.rs, eliminate SettingsResponse, decompose Settings.tsx, extract shared LLM error mapping, replace trace_article with struct. + +**Architecture:** Pure refactoring — no behavioral changes. Each task is independently committable. Tasks 1 and 5 both modify `synthesis.rs` so Task 5 should run after Task 1. + +**Tech Stack:** Rust (Axum, sqlx), SolidJS, TypeScript + +**Spec:** `docs/superpowers/specs/2026-03-26-structural-refactoring-design.md` + +--- + +### Task 1: Extract shared helpers from `synthesis.rs` + +**Files:** +- Modify: `backend/src/services/synthesis.rs` + +This is the highest-impact refactoring. The `run_generation_inner` function (~650 lines) has the scrape+classify batch loop duplicated in Phase 1 (lines ~390-540) and Phase 2 Brave (lines ~610-740), plus filtering logic duplicated in Phase 2 Brave and Phase 2 LLM. + +- [ ] **Step 1: Read the full file and understand the structure** + +Read `backend/src/services/synthesis.rs` entirely. Identify: +- Phase 1 batch loop (starts around "1b. Scrape, classify, summarize in batches") +- Phase 2 Brave batch loop (inside `if settings.use_brave_search`) +- Phase 2 LLM filtering (inside the `else` branch) +- The category assignment logic (appears after each classify result) +- The `trace_article` calls (will be refactored in Task 5, leave as-is for now) + +- [ ] **Step 2: Extract `assign_category` helper** + +This logic is duplicated identically in Phase 1 (~lines 497-530) and Phase 2 Brave (~lines 700-740). Extract it into a private helper: + +```rust +/// Assign an article to a category based on LLM classification response. +/// +/// Returns `(category_key, category_name)` — e.g. `("category_0", "AI News")`. +/// Handles overflow to "Autre" when the target category is full. +/// Returns `None` if both the target category and "Autre" are full (article should be skipped). +fn assign_category( + llm_response: &serde_json::Value, + page_title: &str, + user_categories: &[String], + classification_categories: &[String], + filled_counts: &HashMap, + max_items_per_category: usize, +) -> Option<(String, String, String, String)> { + // Returns (cat_key, cat_name, llm_title, llm_summary) + let llm_title = llm_response.get("title").and_then(|t| t.as_str()).unwrap_or(page_title).to_string(); + let llm_summary = llm_response.get("summary").and_then(|s| s.as_str()).unwrap_or("").to_string(); + let mut llm_category = llm_response.get("category").and_then(|c| c.as_str()).unwrap_or("Autre").to_string(); + + if !classification_categories.iter().any(|c| c.to_lowercase() == llm_category.to_lowercase()) { + llm_category = "Autre".to_string(); + } + + let cat_key = if llm_category.to_lowercase() == "autre" { + "category_autre".to_string() + } else { + user_categories.iter().position(|c| c.to_lowercase() == llm_category.to_lowercase()) + .map(|i| format!("category_{}", i)) + .unwrap_or_else(|| "category_autre".to_string()) + }; + + let cat_filled = filled_counts.get(&llm_category).copied().unwrap_or(0); + if cat_filled >= max_items_per_category && llm_category.to_lowercase() != "autre" { + let autre_filled = filled_counts.get("Autre").copied().unwrap_or(0); + if autre_filled >= max_items_per_category { + return None; // Skip article + } + Some(("category_autre".to_string(), "Autre".to_string(), llm_title, llm_summary)) + } else { + Some((cat_key, llm_category, llm_title, llm_summary)) + } +} +``` + +Replace both Phase 1 and Phase 2 Brave category assignment blocks with calls to this helper. + +- [ ] **Step 3: Extract `filter_phase2_url` helper** + +The filtering logic (homepage, cross-phase dedup, history dedup, source diversity) is duplicated between Phase 2 Brave (~lines 564-595) and Phase 2 LLM (~lines 580-616). Extract: + +```rust +/// Check if a Phase 2 URL passes all filters. +/// Returns the filter reason if rejected, None if accepted. +async fn filter_phase2_url( + pool: &sqlx::PgPool, + user_id: Uuid, + url: &str, + seen_urls: &std::collections::HashSet, + source_counts: &HashMap, + article_history_days: i32, + max_articles_per_source: usize, +) -> Option<&'static str> { + // Homepage filter + if let Ok(parsed_url) = url::Url::parse(url) { + let path = parsed_url.path(); + if path.is_empty() || path == "/" { + return Some("filtered_homepage"); + } + } + + // Cross-phase dedup + if seen_urls.contains(&url.to_lowercase()) { + return Some("filtered_cross_phase_dedup"); + } + + // History dedup + if article_history_days > 0 { + let hash = hash_article_url(url); + let exists = db::article_history::check_urls_exist(pool, user_id, std::slice::from_ref(&hash)).await.unwrap_or_default(); + if exists.contains(&hash) { + return Some("filtered_history"); + } + } + + // Source diversity + if let Some(domain) = extract_domain(url) { + let count = source_counts.get(&domain).copied().unwrap_or(0); + if count >= max_articles_per_source { + return Some("filtered_diversity"); + } + } + + None // Accepted +} +``` + +Replace both Phase 2 Brave and Phase 2 LLM inline filtering with calls to this helper. Each call site still handles `trace_article` with its own `source_type`. + +- [ ] **Step 4: Build and test** + +Run: `cd backend && cargo build && cargo test --lib` +Expected: All 369 tests pass, no behavioral change + +- [ ] **Step 5: Commit** + +```bash +git add backend/src/services/synthesis.rs +git commit -m "refactor: extract assign_category and filter_phase2_url helpers from synthesis pipeline" +``` + +--- + +### Task 2: Eliminate `SettingsResponse` struct + +**Files:** +- Modify: `backend/src/models/settings.rs` +- Modify: `backend/src/handlers/settings.rs` + +- [ ] **Step 1: Add `#[serde(skip_serializing)]` to `UserSettings`** + +In `backend/src/models/settings.rs`, add `#[serde(skip_serializing)]` to the `user_id` and `updated_at` fields of `UserSettings`: + +```rust +#[derive(Debug, Clone, Serialize)] +pub struct UserSettings { + #[serde(skip_serializing)] + pub user_id: Uuid, + pub theme: String, + // ... all other fields unchanged ... + #[serde(skip_serializing)] + pub updated_at: DateTime, +} +``` + +- [ ] **Step 2: Delete `SettingsResponse` and its `From` impl** + +Delete the entire `SettingsResponse` struct (lines ~29-46) and the `impl From for SettingsResponse` block (lines ~48-66). + +- [ ] **Step 3: Update handlers** + +In `backend/src/handlers/settings.rs`: +- Remove `use crate::models::settings::SettingsResponse` (or the path it's imported from — check the actual import) +- In `get_settings`: change `Ok(Json(SettingsResponse::from(settings)))` to `Ok(Json(settings))` +- In `update_settings`: change `Ok(Json(SettingsResponse::from(settings)))` to `Ok(Json(settings))` + +- [ ] **Step 4: Check for other usages** + +Run: `cd backend && grep -r "SettingsResponse" src/` — should return no results. + +- [ ] **Step 5: Build and test** + +Run: `cd backend && cargo build && cargo test --lib` +Expected: All pass + +- [ ] **Step 6: Commit** + +```bash +git add backend/src/models/settings.rs backend/src/handlers/settings.rs +git commit -m "refactor: eliminate SettingsResponse struct, serialize UserSettings directly" +``` + +--- + +### Task 3: Decompose `Settings.tsx` + +**Files:** +- Create: `frontend/src/components/settings/SettingsBraveSearch.tsx` +- Create: `frontend/src/components/settings/SettingsRateLimit.tsx` +- Create: `frontend/src/components/settings/SettingsAdvanced.tsx` +- Modify: `frontend/src/pages/Settings.tsx` + +- [ ] **Step 1: Read Settings.tsx and identify section boundaries** + +Read `frontend/src/pages/Settings.tsx` entirely. Identify: +- Brave Search section (~lines 572-670) — key management + toggle +- Rate Limit section (~lines 908-980) — two number inputs + effective rate display + reset +- Advanced extraction section (~lines 546-570) — checkbox + history days + batch size + search behavior + +- [ ] **Step 2: Create `SettingsBraveSearch.tsx`** + +Create `frontend/src/components/settings/SettingsBraveSearch.tsx`. Extract the Brave Search section into a component that receives: + +```tsx +interface SettingsBraveSearchProps { + settings: () => UserSettings; + setSettings: SetStoreFunction; // or whatever the setter type is + onKeyChanged?: () => void; // to refetch api keys in parent +} +``` + +Move the Brave-specific signals (`braveKeyInput`, `braveSaving`, `braveTesting`), the `braveKey()` derived accessor, and the handler functions (`handleBraveKeySave`, `handleBraveKeyTest`, `handleBraveKeyDelete`) into this component. The component loads its own API keys via `apiKeysApi.list()`. + +- [ ] **Step 3: Create `SettingsRateLimit.tsx`** + +Create `frontend/src/components/settings/SettingsRateLimit.tsx`. Extract the rate limit section. Props: + +```tsx +interface SettingsRateLimitProps { + settings: () => UserSettings; + setSettings: SetStoreFunction; +} +``` + +- [ ] **Step 4: Create `SettingsAdvanced.tsx`** + +Create `frontend/src/components/settings/SettingsAdvanced.tsx`. Extract the advanced extraction section (checkbox, history days, batch size, search behavior textarea). Props same pattern. + +- [ ] **Step 5: Update Settings.tsx to use sub-components** + +Replace the inline sections with component imports: +```tsx +import SettingsBraveSearch from '~/components/settings/SettingsBraveSearch'; +import SettingsRateLimit from '~/components/settings/SettingsRateLimit'; +import SettingsAdvanced from '~/components/settings/SettingsAdvanced'; +``` + +The parent keeps: general settings (theme, categories, max_age/items/articles), provider/model selection, API key manager, and the save button. + +- [ ] **Step 6: TypeScript check** + +Run: `cd frontend && npx tsc --noEmit` +Expected: No errors + +- [ ] **Step 7: Commit** + +```bash +git add frontend/src/components/settings/ frontend/src/pages/Settings.tsx +git commit -m "refactor: decompose Settings.tsx into sub-components" +``` + +--- + +### Task 4: Extract shared LLM error mapping + +**Files:** +- Modify: `backend/src/services/llm/mod.rs` +- Modify: `backend/src/services/llm/openai.rs` +- Modify: `backend/src/services/llm/gemini.rs` +- Modify: `backend/src/services/llm/anthropic.rs` + +- [ ] **Step 1: Read all three error mapping functions** + +Read the `map_*_error` functions in all three provider files. Note the differences: +- OpenAI: extracts `error.message` + `error.type`, handles 400/401/403/404/429 +- Gemini: extracts `error.message` + `error.status`, merges 401+403, handles 400/401|403/404/429 +- Anthropic: extracts `error.message` + `error.type`, handles 400/401/403/404/429/529 + +- [ ] **Step 2: Add shared mapper in `mod.rs`** + +In `backend/src/services/llm/mod.rs`, add: + +```rust +/// Shared HTTP error mapping for LLM provider responses. +/// +/// Maps common HTTP status codes to `AppError` variants. +/// Provider-specific logging should happen before calling this. +pub fn map_provider_http_error(status: u16, provider_name: &str) -> AppError { + match status { + 400 => AppError::BadRequest("Invalid request to LLM provider".into()), + 401 => AppError::BadRequest("Invalid or unauthorized API key".into()), + 403 => AppError::BadRequest("Access denied by LLM provider".into()), + 404 => AppError::BadRequest("Model not found or not available".into()), + 429 | 529 => AppError::RateLimited( + "LLM provider rate limit exceeded. Please try again later.".into(), + ), + _ => AppError::Internal(anyhow::anyhow!( + "{} returned HTTP {}", provider_name, status + )), + } +} +``` + +Note: 529 (Anthropic overloaded) is included in the shared mapper as it's semantically equivalent to 429 for any provider. + +- [ ] **Step 3: Replace each provider's error mapper** + +In each provider file, replace the `map_*_error` function with a thinner version that logs provider-specific details, then delegates to the shared mapper: + +**OpenAI:** +```rust +fn map_openai_error(status: u16, body: &Value) -> AppError { + let error_message = body.get("error").and_then(|e| e.get("message")).and_then(|m| m.as_str()).unwrap_or("Unknown error"); + let error_type = body.get("error").and_then(|e| e.get("type")).and_then(|t| t.as_str()).unwrap_or(""); + tracing::error!("OpenAI API error (HTTP {}): {} (type: {})", status, error_message, error_type); + super::map_provider_http_error(status, "OpenAI") +} +``` + +**Gemini:** +```rust +fn map_gemini_error(status: u16, body: &Value) -> AppError { + let error_message = body.get("error").and_then(|e| e.get("message")).and_then(|m| m.as_str()).unwrap_or("Unknown error"); + let error_status = body.get("error").and_then(|e| e.get("status")).and_then(|s| s.as_str()).unwrap_or(""); + tracing::error!("Gemini API error (HTTP {}): {} (status: {})", status, error_message, error_status); + super::map_provider_http_error(status, "Gemini") +} +``` + +**Anthropic:** +```rust +fn map_anthropic_error(status: u16, body: &Value) -> AppError { + let error_message = body.get("error").and_then(|e| e.get("message")).and_then(|m| m.as_str()).unwrap_or("Unknown error"); + let error_type = body.get("error").and_then(|e| e.get("type")).and_then(|t| t.as_str()).unwrap_or(""); + tracing::error!("Anthropic API error (HTTP {}): {} (type: {})", status, error_message, error_type); + super::map_provider_http_error(status, "Anthropic") +} +``` + +- [ ] **Step 4: Build and test** + +Run: `cd backend && cargo build && cargo test --lib` +Expected: All pass + +- [ ] **Step 5: Commit** + +```bash +git add backend/src/services/llm/mod.rs backend/src/services/llm/openai.rs backend/src/services/llm/gemini.rs backend/src/services/llm/anthropic.rs +git commit -m "refactor: extract shared LLM error mapping to reduce duplication" +``` + +--- + +### Task 5: Replace `trace_article` parameters with `ArticleTrace` struct + +**Files:** +- Modify: `backend/src/services/synthesis.rs` + +This task should run AFTER Task 1, since both modify `synthesis.rs`. + +- [ ] **Step 1: Define `ArticleTrace` struct** + +Add near the top of `synthesis.rs` (in the helper functions section): + +```rust +/// Structured parameters for article history tracing. +struct ArticleTrace<'a> { + url: &'a str, + title: &'a str, + source_type: &'a str, + source_url: Option<&'a str>, + category: Option<&'a str>, + synthesis_id: Option, + status: &'a str, + scraped_ok: bool, +} +``` + +- [ ] **Step 2: Update `trace_article` signature** + +Change from 11 parameters to 4: + +```rust +async fn trace_article( + pool: &sqlx::PgPool, + user_id: Uuid, + job_id: Uuid, + trace: &ArticleTrace<'_>, +) { + let entry = db::article_history::ArticleHistoryEntry { + user_id, + url: trace.url.to_string(), + url_hash: hash_article_url(trace.url), + title: trace.title.to_string(), + source_type: trace.source_type.to_string(), + source_url: trace.source_url.map(|s| s.to_string()), + category: trace.category.map(|s| s.to_string()), + synthesis_id: trace.synthesis_id, + status: trace.status.to_string(), + scraped_ok: trace.scraped_ok, + job_id, + }; + db::article_history::insert_entry(pool, &entry).await.ok(); +} +``` + +- [ ] **Step 3: Update all call sites** + +Find every `trace_article(` call in the file. Each one changes from positional args to struct literal. Example: + +```rust +// Before: +trace_article(&state.pool, user_id, job_id, &url, "", "personalized_source", Some(&source_url), None, None, "filtered_diversity", false).await; + +// After: +trace_article(&state.pool, user_id, job_id, &ArticleTrace { + url: &url, title: "", source_type: "personalized_source", + source_url: Some(&source_url), category: None, synthesis_id: None, + status: "filtered_diversity", scraped_ok: false, +}).await; +``` + +There are approximately 15-20 call sites. Update all of them. Use `grep -n "trace_article(" backend/src/services/synthesis.rs` to find them all. + +- [ ] **Step 4: Build and test** + +Run: `cd backend && cargo build && cargo test --lib` +Expected: All pass + +- [ ] **Step 5: Commit** + +```bash +git add backend/src/services/synthesis.rs +git commit -m "refactor: replace trace_article 11 parameters with ArticleTrace struct" +```