docs: add spec for structural refactoring (audit sub-project B)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
master
oabrivard 3 months ago
parent 5a66443a94
commit fd1130fd5a

@ -0,0 +1,117 @@
# Design: Structural Refactoring (Audit Sub-project B)
**Date**: 2026-03-26
**Scope**: 5 structural refactoring items from the code audit
**Source**: `docs/audit/00-consolidated-summary.md`
---
## 1. Decompose `synthesis.rs`
### Problem
`run_generation_inner` is ~650 lines with the scrape+classify batch loop duplicated 3 times (Phase 1, Phase 2 Brave, Phase 2 LLM) and filtering logic duplicated across paths.
### Refactoring
Extract private helper functions within `synthesis.rs`:
- **`process_article_batch()`** — takes a batch of URLs, scrapes in parallel (JoinSet), classifies in parallel (JoinSet), updates `filled_counts` / `article_scraped` / `source_counts`. Used by Phase 1 and Phase 2 Brave paths.
- **`filter_candidate_url()`** — homepage check, cross-phase dedup, history dedup, source diversity check. Returns whether the URL passes. Used by Phase 2 LLM and Phase 2 Brave paths.
- **`assign_category()`** — maps LLM classification response to category key, handles overflow to "Autre". Used in every classify result handler.
No new files — these are private helpers within the pipeline context.
### Expected result
`run_generation_inner` shrinks from ~650 to ~300 lines. No behavioral change.
### Files
- **Modify:** `backend/src/services/synthesis.rs`
---
## 2. Eliminate `SettingsResponse` struct
### Problem
Every new setting requires synchronized changes to 4 Rust structs (`UserSettings`, `SettingsResponse`, `UpdateSettingsRequest`, `SettingsRow`). `SettingsResponse` is identical to `UserSettings` minus `user_id` and `updated_at`.
### Refactoring
- Add `#[serde(skip_serializing)]` to `user_id` and `updated_at` fields on `UserSettings`
- Delete `SettingsResponse` struct and its `From<UserSettings>` impl
- Update the settings GET handler to return `Json(settings)` directly
### Expected result
Future settings changes need 3 structs instead of 4.
### Files
- **Modify:** `backend/src/models/settings.rs` — remove `SettingsResponse`, add skip attributes
- **Modify:** `backend/src/handlers/settings.rs` — return `UserSettings` directly
---
## 3. Decompose `Settings.tsx`
### Problem
1,025-line component mixing 7+ concerns.
### Refactoring
Extract sub-components that receive `settings()` accessor + `setSettings()` setter as props:
- **`SettingsBraveSearch`** — Brave key input + toggle + auto-disable on delete
- **`SettingsRateLimit`** — rate limit fields + reset button
- **`SettingsAdvanced`** — `use_llm_for_source_links`, `article_history_days`, `batch_size`, search behavior
The general settings (theme, categories, max_age_days, max_items_per_category, max_articles_per_source) and provider/model selection remain in the main `Settings.tsx` — they are the core of the page and don't benefit from extraction. The goal is to extract the self-contained sections, not to empty the parent.
The main `Settings.tsx` remains the container: loads data, holds the settings signal, renders sub-components, owns the save button.
### Files
- **Create:** `frontend/src/components/settings/SettingsBraveSearch.tsx`
- **Create:** `frontend/src/components/settings/SettingsRateLimit.tsx`
- **Create:** `frontend/src/components/settings/SettingsAdvanced.tsx`
- **Modify:** `frontend/src/pages/Settings.tsx` — import and render sub-components
---
## 4. Extract shared LLM error mapping
### Problem
`map_openai_error`, `map_gemini_error`, `map_anthropic_error` are near-identical functions mapping HTTP status codes to `AppError`.
### Refactoring
- Add `pub fn map_provider_http_error(status: u16, body: &serde_json::Value, provider_name: &str) -> AppError` in `backend/src/services/llm/mod.rs`
- The shared function handles the common status codes: 401 → `BadRequest("Invalid API key")`, 429 → `RateLimited(...)`, 500+ → `Internal(...)`, and a default fallback
- Provider-specific status codes (e.g. Anthropic's 529 for overloaded) are handled as pre-processing before calling the shared mapper, or the shared function includes them (529 → `RateLimited` is reasonable for all providers)
- Each provider's `map_*_error` function is replaced with a call to the shared function, with any provider-specific JSON error extraction done inline before the call
### Files
- **Modify:** `backend/src/services/llm/mod.rs` — add shared function
- **Modify:** `backend/src/services/llm/openai.rs` — use shared mapper
- **Modify:** `backend/src/services/llm/gemini.rs` — use shared mapper
- **Modify:** `backend/src/services/llm/anthropic.rs` — use shared mapper
---
## 5. Replace `trace_article` 11-parameter function with struct
### Problem
`trace_article` has 11 positional parameters, most `Option`. Call sites are hard to read.
### Refactoring
Create an `ArticleTrace` struct:
```rust
struct ArticleTrace<'a> {
url: &'a str,
title: &'a str,
source_type: &'a str,
source_url: Option<&'a str>,
category: Option<&'a str>,
synthesis_id: Option<Uuid>,
status: &'a str,
scraped_ok: bool,
}
```
Change `trace_article` signature to `(pool, user_id, job_id, trace: &ArticleTrace)`.
Update all call sites to use struct literal syntax.
### Files
- **Modify:** `backend/src/services/synthesis.rs`
Loading…
Cancel
Save