docs: add spec for structural refactoring (audit sub-project B)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>master
parent
5a66443a94
commit
fd1130fd5a
@ -0,0 +1,117 @@
|
||||
# Design: Structural Refactoring (Audit Sub-project B)
|
||||
|
||||
**Date**: 2026-03-26
|
||||
**Scope**: 5 structural refactoring items from the code audit
|
||||
**Source**: `docs/audit/00-consolidated-summary.md`
|
||||
|
||||
---
|
||||
|
||||
## 1. Decompose `synthesis.rs`
|
||||
|
||||
### Problem
|
||||
`run_generation_inner` is ~650 lines with the scrape+classify batch loop duplicated 3 times (Phase 1, Phase 2 Brave, Phase 2 LLM) and filtering logic duplicated across paths.
|
||||
|
||||
### Refactoring
|
||||
Extract private helper functions within `synthesis.rs`:
|
||||
|
||||
- **`process_article_batch()`** — takes a batch of URLs, scrapes in parallel (JoinSet), classifies in parallel (JoinSet), updates `filled_counts` / `article_scraped` / `source_counts`. Used by Phase 1 and Phase 2 Brave paths.
|
||||
- **`filter_candidate_url()`** — homepage check, cross-phase dedup, history dedup, source diversity check. Returns whether the URL passes. Used by Phase 2 LLM and Phase 2 Brave paths.
|
||||
- **`assign_category()`** — maps LLM classification response to category key, handles overflow to "Autre". Used in every classify result handler.
|
||||
|
||||
No new files — these are private helpers within the pipeline context.
|
||||
|
||||
### Expected result
|
||||
`run_generation_inner` shrinks from ~650 to ~300 lines. No behavioral change.
|
||||
|
||||
### Files
|
||||
- **Modify:** `backend/src/services/synthesis.rs`
|
||||
|
||||
---
|
||||
|
||||
## 2. Eliminate `SettingsResponse` struct
|
||||
|
||||
### Problem
|
||||
Every new setting requires synchronized changes to 4 Rust structs (`UserSettings`, `SettingsResponse`, `UpdateSettingsRequest`, `SettingsRow`). `SettingsResponse` is identical to `UserSettings` minus `user_id` and `updated_at`.
|
||||
|
||||
### Refactoring
|
||||
- Add `#[serde(skip_serializing)]` to `user_id` and `updated_at` fields on `UserSettings`
|
||||
- Delete `SettingsResponse` struct and its `From<UserSettings>` impl
|
||||
- Update the settings GET handler to return `Json(settings)` directly
|
||||
|
||||
### Expected result
|
||||
Future settings changes need 3 structs instead of 4.
|
||||
|
||||
### Files
|
||||
- **Modify:** `backend/src/models/settings.rs` — remove `SettingsResponse`, add skip attributes
|
||||
- **Modify:** `backend/src/handlers/settings.rs` — return `UserSettings` directly
|
||||
|
||||
---
|
||||
|
||||
## 3. Decompose `Settings.tsx`
|
||||
|
||||
### Problem
|
||||
1,025-line component mixing 7+ concerns.
|
||||
|
||||
### Refactoring
|
||||
Extract sub-components that receive `settings()` accessor + `setSettings()` setter as props:
|
||||
|
||||
- **`SettingsBraveSearch`** — Brave key input + toggle + auto-disable on delete
|
||||
- **`SettingsRateLimit`** — rate limit fields + reset button
|
||||
- **`SettingsAdvanced`** — `use_llm_for_source_links`, `article_history_days`, `batch_size`, search behavior
|
||||
|
||||
The general settings (theme, categories, max_age_days, max_items_per_category, max_articles_per_source) and provider/model selection remain in the main `Settings.tsx` — they are the core of the page and don't benefit from extraction. The goal is to extract the self-contained sections, not to empty the parent.
|
||||
|
||||
The main `Settings.tsx` remains the container: loads data, holds the settings signal, renders sub-components, owns the save button.
|
||||
|
||||
### Files
|
||||
- **Create:** `frontend/src/components/settings/SettingsBraveSearch.tsx`
|
||||
- **Create:** `frontend/src/components/settings/SettingsRateLimit.tsx`
|
||||
- **Create:** `frontend/src/components/settings/SettingsAdvanced.tsx`
|
||||
- **Modify:** `frontend/src/pages/Settings.tsx` — import and render sub-components
|
||||
|
||||
---
|
||||
|
||||
## 4. Extract shared LLM error mapping
|
||||
|
||||
### Problem
|
||||
`map_openai_error`, `map_gemini_error`, `map_anthropic_error` are near-identical functions mapping HTTP status codes to `AppError`.
|
||||
|
||||
### Refactoring
|
||||
- Add `pub fn map_provider_http_error(status: u16, body: &serde_json::Value, provider_name: &str) -> AppError` in `backend/src/services/llm/mod.rs`
|
||||
- The shared function handles the common status codes: 401 → `BadRequest("Invalid API key")`, 429 → `RateLimited(...)`, 500+ → `Internal(...)`, and a default fallback
|
||||
- Provider-specific status codes (e.g. Anthropic's 529 for overloaded) are handled as pre-processing before calling the shared mapper, or the shared function includes them (529 → `RateLimited` is reasonable for all providers)
|
||||
- Each provider's `map_*_error` function is replaced with a call to the shared function, with any provider-specific JSON error extraction done inline before the call
|
||||
|
||||
### Files
|
||||
- **Modify:** `backend/src/services/llm/mod.rs` — add shared function
|
||||
- **Modify:** `backend/src/services/llm/openai.rs` — use shared mapper
|
||||
- **Modify:** `backend/src/services/llm/gemini.rs` — use shared mapper
|
||||
- **Modify:** `backend/src/services/llm/anthropic.rs` — use shared mapper
|
||||
|
||||
---
|
||||
|
||||
## 5. Replace `trace_article` 11-parameter function with struct
|
||||
|
||||
### Problem
|
||||
`trace_article` has 11 positional parameters, most `Option`. Call sites are hard to read.
|
||||
|
||||
### Refactoring
|
||||
Create an `ArticleTrace` struct:
|
||||
```rust
|
||||
struct ArticleTrace<'a> {
|
||||
url: &'a str,
|
||||
title: &'a str,
|
||||
source_type: &'a str,
|
||||
source_url: Option<&'a str>,
|
||||
category: Option<&'a str>,
|
||||
synthesis_id: Option<Uuid>,
|
||||
status: &'a str,
|
||||
scraped_ok: bool,
|
||||
}
|
||||
```
|
||||
|
||||
Change `trace_article` signature to `(pool, user_id, job_id, trace: &ArticleTrace)`.
|
||||
Update all call sites to use struct literal syntax.
|
||||
|
||||
### Files
|
||||
- **Modify:** `backend/src/services/synthesis.rs`
|
||||
Loading…
Reference in New Issue