From fd1130fd5aa204c906a9de9018b93aa096d49ac7 Mon Sep 17 00:00:00 2001 From: oabrivard Date: Thu, 26 Mar 2026 01:02:47 +0100 Subject: [PATCH] docs: add spec for structural refactoring (audit sub-project B) Co-Authored-By: Claude Opus 4.6 (1M context) --- ...026-03-26-structural-refactoring-design.md | 117 ++++++++++++++++++ 1 file changed, 117 insertions(+) create mode 100644 docs/superpowers/specs/2026-03-26-structural-refactoring-design.md diff --git a/docs/superpowers/specs/2026-03-26-structural-refactoring-design.md b/docs/superpowers/specs/2026-03-26-structural-refactoring-design.md new file mode 100644 index 0000000..2ff9e06 --- /dev/null +++ b/docs/superpowers/specs/2026-03-26-structural-refactoring-design.md @@ -0,0 +1,117 @@ +# Design: Structural Refactoring (Audit Sub-project B) + +**Date**: 2026-03-26 +**Scope**: 5 structural refactoring items from the code audit +**Source**: `docs/audit/00-consolidated-summary.md` + +--- + +## 1. Decompose `synthesis.rs` + +### Problem +`run_generation_inner` is ~650 lines with the scrape+classify batch loop duplicated 3 times (Phase 1, Phase 2 Brave, Phase 2 LLM) and filtering logic duplicated across paths. + +### Refactoring +Extract private helper functions within `synthesis.rs`: + +- **`process_article_batch()`** — takes a batch of URLs, scrapes in parallel (JoinSet), classifies in parallel (JoinSet), updates `filled_counts` / `article_scraped` / `source_counts`. Used by Phase 1 and Phase 2 Brave paths. +- **`filter_candidate_url()`** — homepage check, cross-phase dedup, history dedup, source diversity check. Returns whether the URL passes. Used by Phase 2 LLM and Phase 2 Brave paths. +- **`assign_category()`** — maps LLM classification response to category key, handles overflow to "Autre". Used in every classify result handler. + +No new files — these are private helpers within the pipeline context. + +### Expected result +`run_generation_inner` shrinks from ~650 to ~300 lines. No behavioral change. + +### Files +- **Modify:** `backend/src/services/synthesis.rs` + +--- + +## 2. Eliminate `SettingsResponse` struct + +### Problem +Every new setting requires synchronized changes to 4 Rust structs (`UserSettings`, `SettingsResponse`, `UpdateSettingsRequest`, `SettingsRow`). `SettingsResponse` is identical to `UserSettings` minus `user_id` and `updated_at`. + +### Refactoring +- Add `#[serde(skip_serializing)]` to `user_id` and `updated_at` fields on `UserSettings` +- Delete `SettingsResponse` struct and its `From` impl +- Update the settings GET handler to return `Json(settings)` directly + +### Expected result +Future settings changes need 3 structs instead of 4. + +### Files +- **Modify:** `backend/src/models/settings.rs` — remove `SettingsResponse`, add skip attributes +- **Modify:** `backend/src/handlers/settings.rs` — return `UserSettings` directly + +--- + +## 3. Decompose `Settings.tsx` + +### Problem +1,025-line component mixing 7+ concerns. + +### Refactoring +Extract sub-components that receive `settings()` accessor + `setSettings()` setter as props: + +- **`SettingsBraveSearch`** — Brave key input + toggle + auto-disable on delete +- **`SettingsRateLimit`** — rate limit fields + reset button +- **`SettingsAdvanced`** — `use_llm_for_source_links`, `article_history_days`, `batch_size`, search behavior + +The general settings (theme, categories, max_age_days, max_items_per_category, max_articles_per_source) and provider/model selection remain in the main `Settings.tsx` — they are the core of the page and don't benefit from extraction. The goal is to extract the self-contained sections, not to empty the parent. + +The main `Settings.tsx` remains the container: loads data, holds the settings signal, renders sub-components, owns the save button. + +### Files +- **Create:** `frontend/src/components/settings/SettingsBraveSearch.tsx` +- **Create:** `frontend/src/components/settings/SettingsRateLimit.tsx` +- **Create:** `frontend/src/components/settings/SettingsAdvanced.tsx` +- **Modify:** `frontend/src/pages/Settings.tsx` — import and render sub-components + +--- + +## 4. Extract shared LLM error mapping + +### Problem +`map_openai_error`, `map_gemini_error`, `map_anthropic_error` are near-identical functions mapping HTTP status codes to `AppError`. + +### Refactoring +- Add `pub fn map_provider_http_error(status: u16, body: &serde_json::Value, provider_name: &str) -> AppError` in `backend/src/services/llm/mod.rs` +- The shared function handles the common status codes: 401 → `BadRequest("Invalid API key")`, 429 → `RateLimited(...)`, 500+ → `Internal(...)`, and a default fallback +- Provider-specific status codes (e.g. Anthropic's 529 for overloaded) are handled as pre-processing before calling the shared mapper, or the shared function includes them (529 → `RateLimited` is reasonable for all providers) +- Each provider's `map_*_error` function is replaced with a call to the shared function, with any provider-specific JSON error extraction done inline before the call + +### Files +- **Modify:** `backend/src/services/llm/mod.rs` — add shared function +- **Modify:** `backend/src/services/llm/openai.rs` — use shared mapper +- **Modify:** `backend/src/services/llm/gemini.rs` — use shared mapper +- **Modify:** `backend/src/services/llm/anthropic.rs` — use shared mapper + +--- + +## 5. Replace `trace_article` 11-parameter function with struct + +### Problem +`trace_article` has 11 positional parameters, most `Option`. Call sites are hard to read. + +### Refactoring +Create an `ArticleTrace` struct: +```rust +struct ArticleTrace<'a> { + url: &'a str, + title: &'a str, + source_type: &'a str, + source_url: Option<&'a str>, + category: Option<&'a str>, + synthesis_id: Option, + status: &'a str, + scraped_ok: bool, +} +``` + +Change `trace_article` signature to `(pool, user_id, job_id, trace: &ArticleTrace)`. +Update all call sites to use struct literal syntax. + +### Files +- **Modify:** `backend/src/services/synthesis.rs`