You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
ai_synth/docs/superpowers/plans/2026-03-26-structural-refac...

439 lines
16 KiB
Markdown

# Structural Refactoring — Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** 5 structural refactoring items from the code audit: decompose synthesis.rs, eliminate SettingsResponse, decompose Settings.tsx, extract shared LLM error mapping, replace trace_article with struct.
**Architecture:** Pure refactoring — no behavioral changes. Each task is independently committable. Tasks 1 and 5 both modify `synthesis.rs` so Task 5 should run after Task 1.
**Tech Stack:** Rust (Axum, sqlx), SolidJS, TypeScript
**Spec:** `docs/superpowers/specs/2026-03-26-structural-refactoring-design.md`
---
### Task 1: Extract shared helpers from `synthesis.rs`
**Files:**
- Modify: `backend/src/services/synthesis.rs`
This is the highest-impact refactoring. The `run_generation_inner` function (~650 lines) has the scrape+classify batch loop duplicated in Phase 1 (lines ~390-540) and Phase 2 Brave (lines ~610-740), plus filtering logic duplicated in Phase 2 Brave and Phase 2 LLM.
- [ ] **Step 1: Read the full file and understand the structure**
Read `backend/src/services/synthesis.rs` entirely. Identify:
- Phase 1 batch loop (starts around "1b. Scrape, classify, summarize in batches")
- Phase 2 Brave batch loop (inside `if settings.use_brave_search`)
- Phase 2 LLM filtering (inside the `else` branch)
- The category assignment logic (appears after each classify result)
- The `trace_article` calls (will be refactored in Task 5, leave as-is for now)
- [ ] **Step 2: Extract `assign_category` helper**
This logic is duplicated identically in Phase 1 (~lines 497-530) and Phase 2 Brave (~lines 700-740). Extract it into a private helper:
```rust
/// Assign an article to a category based on LLM classification response.
///
/// Returns `(category_key, category_name)` — e.g. `("category_0", "AI News")`.
/// Handles overflow to "Autre" when the target category is full.
/// Returns `None` if both the target category and "Autre" are full (article should be skipped).
fn assign_category(
llm_response: &serde_json::Value,
page_title: &str,
user_categories: &[String],
classification_categories: &[String],
filled_counts: &HashMap<String, usize>,
max_items_per_category: usize,
) -> Option<(String, String, String, String)> {
// Returns (cat_key, cat_name, llm_title, llm_summary)
let llm_title = llm_response.get("title").and_then(|t| t.as_str()).unwrap_or(page_title).to_string();
let llm_summary = llm_response.get("summary").and_then(|s| s.as_str()).unwrap_or("").to_string();
let mut llm_category = llm_response.get("category").and_then(|c| c.as_str()).unwrap_or("Autre").to_string();
if !classification_categories.iter().any(|c| c.to_lowercase() == llm_category.to_lowercase()) {
llm_category = "Autre".to_string();
}
let cat_key = if llm_category.to_lowercase() == "autre" {
"category_autre".to_string()
} else {
user_categories.iter().position(|c| c.to_lowercase() == llm_category.to_lowercase())
.map(|i| format!("category_{}", i))
.unwrap_or_else(|| "category_autre".to_string())
};
let cat_filled = filled_counts.get(&llm_category).copied().unwrap_or(0);
if cat_filled >= max_items_per_category && llm_category.to_lowercase() != "autre" {
let autre_filled = filled_counts.get("Autre").copied().unwrap_or(0);
if autre_filled >= max_items_per_category {
return None; // Skip article
}
Some(("category_autre".to_string(), "Autre".to_string(), llm_title, llm_summary))
} else {
Some((cat_key, llm_category, llm_title, llm_summary))
}
}
```
Replace both Phase 1 and Phase 2 Brave category assignment blocks with calls to this helper.
- [ ] **Step 3: Extract `filter_phase2_url` helper**
The filtering logic (homepage, cross-phase dedup, history dedup, source diversity) is duplicated between Phase 2 Brave (~lines 564-595) and Phase 2 LLM (~lines 580-616). Extract:
```rust
/// Check if a Phase 2 URL passes all filters.
/// Returns the filter reason if rejected, None if accepted.
async fn filter_phase2_url(
pool: &sqlx::PgPool,
user_id: Uuid,
url: &str,
seen_urls: &std::collections::HashSet<String>,
source_counts: &HashMap<String, usize>,
article_history_days: i32,
max_articles_per_source: usize,
) -> Option<&'static str> {
// Homepage filter
if let Ok(parsed_url) = url::Url::parse(url) {
let path = parsed_url.path();
if path.is_empty() || path == "/" {
return Some("filtered_homepage");
}
}
// Cross-phase dedup
if seen_urls.contains(&url.to_lowercase()) {
return Some("filtered_cross_phase_dedup");
}
// History dedup
if article_history_days > 0 {
let hash = hash_article_url(url);
let exists = db::article_history::check_urls_exist(pool, user_id, std::slice::from_ref(&hash)).await.unwrap_or_default();
if exists.contains(&hash) {
return Some("filtered_history");
}
}
// Source diversity
if let Some(domain) = extract_domain(url) {
let count = source_counts.get(&domain).copied().unwrap_or(0);
if count >= max_articles_per_source {
return Some("filtered_diversity");
}
}
None // Accepted
}
```
Replace both Phase 2 Brave and Phase 2 LLM inline filtering with calls to this helper. Each call site still handles `trace_article` with its own `source_type`.
- [ ] **Step 4: Build and test**
Run: `cd backend && cargo build && cargo test --lib`
Expected: All 369 tests pass, no behavioral change
- [ ] **Step 5: Commit**
```bash
git add backend/src/services/synthesis.rs
git commit -m "refactor: extract assign_category and filter_phase2_url helpers from synthesis pipeline"
```
---
### Task 2: Eliminate `SettingsResponse` struct
**Files:**
- Modify: `backend/src/models/settings.rs`
- Modify: `backend/src/handlers/settings.rs`
- [ ] **Step 1: Add `#[serde(skip_serializing)]` to `UserSettings`**
In `backend/src/models/settings.rs`, add `#[serde(skip_serializing)]` to the `user_id` and `updated_at` fields of `UserSettings`:
```rust
#[derive(Debug, Clone, Serialize)]
pub struct UserSettings {
#[serde(skip_serializing)]
pub user_id: Uuid,
pub theme: String,
// ... all other fields unchanged ...
#[serde(skip_serializing)]
pub updated_at: DateTime<Utc>,
}
```
- [ ] **Step 2: Delete `SettingsResponse` and its `From` impl**
Delete the entire `SettingsResponse` struct (lines ~29-46) and the `impl From<UserSettings> for SettingsResponse` block (lines ~48-66).
- [ ] **Step 3: Update handlers**
In `backend/src/handlers/settings.rs`:
- Remove `use crate::models::settings::SettingsResponse` (or the path it's imported from — check the actual import)
- In `get_settings`: change `Ok(Json(SettingsResponse::from(settings)))` to `Ok(Json(settings))`
- In `update_settings`: change `Ok(Json(SettingsResponse::from(settings)))` to `Ok(Json(settings))`
- [ ] **Step 4: Check for other usages**
Run: `cd backend && grep -r "SettingsResponse" src/` — should return no results.
- [ ] **Step 5: Build and test**
Run: `cd backend && cargo build && cargo test --lib`
Expected: All pass
- [ ] **Step 6: Commit**
```bash
git add backend/src/models/settings.rs backend/src/handlers/settings.rs
git commit -m "refactor: eliminate SettingsResponse struct, serialize UserSettings directly"
```
---
### Task 3: Decompose `Settings.tsx`
**Files:**
- Create: `frontend/src/components/settings/SettingsBraveSearch.tsx`
- Create: `frontend/src/components/settings/SettingsRateLimit.tsx`
- Create: `frontend/src/components/settings/SettingsAdvanced.tsx`
- Modify: `frontend/src/pages/Settings.tsx`
- [ ] **Step 1: Read Settings.tsx and identify section boundaries**
Read `frontend/src/pages/Settings.tsx` entirely. Identify:
- Brave Search section (~lines 572-670) — key management + toggle
- Rate Limit section (~lines 908-980) — two number inputs + effective rate display + reset
- Advanced extraction section (~lines 546-570) — checkbox + history days + batch size + search behavior
- [ ] **Step 2: Create `SettingsBraveSearch.tsx`**
Create `frontend/src/components/settings/SettingsBraveSearch.tsx`. Extract the Brave Search section into a component that receives:
```tsx
interface SettingsBraveSearchProps {
settings: () => UserSettings;
setSettings: SetStoreFunction<UserSettings>; // or whatever the setter type is
onKeyChanged?: () => void; // to refetch api keys in parent
}
```
Move the Brave-specific signals (`braveKeyInput`, `braveSaving`, `braveTesting`), the `braveKey()` derived accessor, and the handler functions (`handleBraveKeySave`, `handleBraveKeyTest`, `handleBraveKeyDelete`) into this component. The component loads its own API keys via `apiKeysApi.list()`.
- [ ] **Step 3: Create `SettingsRateLimit.tsx`**
Create `frontend/src/components/settings/SettingsRateLimit.tsx`. Extract the rate limit section. Props:
```tsx
interface SettingsRateLimitProps {
settings: () => UserSettings;
setSettings: SetStoreFunction<UserSettings>;
}
```
- [ ] **Step 4: Create `SettingsAdvanced.tsx`**
Create `frontend/src/components/settings/SettingsAdvanced.tsx`. Extract the advanced extraction section (checkbox, history days, batch size, search behavior textarea). Props same pattern.
- [ ] **Step 5: Update Settings.tsx to use sub-components**
Replace the inline sections with component imports:
```tsx
import SettingsBraveSearch from '~/components/settings/SettingsBraveSearch';
import SettingsRateLimit from '~/components/settings/SettingsRateLimit';
import SettingsAdvanced from '~/components/settings/SettingsAdvanced';
```
The parent keeps: general settings (theme, categories, max_age/items/articles), provider/model selection, API key manager, and the save button.
- [ ] **Step 6: TypeScript check**
Run: `cd frontend && npx tsc --noEmit`
Expected: No errors
- [ ] **Step 7: Commit**
```bash
git add frontend/src/components/settings/ frontend/src/pages/Settings.tsx
git commit -m "refactor: decompose Settings.tsx into sub-components"
```
---
### Task 4: Extract shared LLM error mapping
**Files:**
- Modify: `backend/src/services/llm/mod.rs`
- Modify: `backend/src/services/llm/openai.rs`
- Modify: `backend/src/services/llm/gemini.rs`
- Modify: `backend/src/services/llm/anthropic.rs`
- [ ] **Step 1: Read all three error mapping functions**
Read the `map_*_error` functions in all three provider files. Note the differences:
- OpenAI: extracts `error.message` + `error.type`, handles 400/401/403/404/429
- Gemini: extracts `error.message` + `error.status`, merges 401+403, handles 400/401|403/404/429
- Anthropic: extracts `error.message` + `error.type`, handles 400/401/403/404/429/529
- [ ] **Step 2: Add shared mapper in `mod.rs`**
In `backend/src/services/llm/mod.rs`, add:
```rust
/// Shared HTTP error mapping for LLM provider responses.
///
/// Maps common HTTP status codes to `AppError` variants.
/// Provider-specific logging should happen before calling this.
pub fn map_provider_http_error(status: u16, provider_name: &str) -> AppError {
match status {
400 => AppError::BadRequest("Invalid request to LLM provider".into()),
401 => AppError::BadRequest("Invalid or unauthorized API key".into()),
403 => AppError::BadRequest("Access denied by LLM provider".into()),
404 => AppError::BadRequest("Model not found or not available".into()),
429 | 529 => AppError::RateLimited(
"LLM provider rate limit exceeded. Please try again later.".into(),
),
_ => AppError::Internal(anyhow::anyhow!(
"{} returned HTTP {}", provider_name, status
)),
}
}
```
Note: 529 (Anthropic overloaded) is included in the shared mapper as it's semantically equivalent to 429 for any provider.
- [ ] **Step 3: Replace each provider's error mapper**
In each provider file, replace the `map_*_error` function with a thinner version that logs provider-specific details, then delegates to the shared mapper:
**OpenAI:**
```rust
fn map_openai_error(status: u16, body: &Value) -> AppError {
let error_message = body.get("error").and_then(|e| e.get("message")).and_then(|m| m.as_str()).unwrap_or("Unknown error");
let error_type = body.get("error").and_then(|e| e.get("type")).and_then(|t| t.as_str()).unwrap_or("");
tracing::error!("OpenAI API error (HTTP {}): {} (type: {})", status, error_message, error_type);
super::map_provider_http_error(status, "OpenAI")
}
```
**Gemini:**
```rust
fn map_gemini_error(status: u16, body: &Value) -> AppError {
let error_message = body.get("error").and_then(|e| e.get("message")).and_then(|m| m.as_str()).unwrap_or("Unknown error");
let error_status = body.get("error").and_then(|e| e.get("status")).and_then(|s| s.as_str()).unwrap_or("");
tracing::error!("Gemini API error (HTTP {}): {} (status: {})", status, error_message, error_status);
super::map_provider_http_error(status, "Gemini")
}
```
**Anthropic:**
```rust
fn map_anthropic_error(status: u16, body: &Value) -> AppError {
let error_message = body.get("error").and_then(|e| e.get("message")).and_then(|m| m.as_str()).unwrap_or("Unknown error");
let error_type = body.get("error").and_then(|e| e.get("type")).and_then(|t| t.as_str()).unwrap_or("");
tracing::error!("Anthropic API error (HTTP {}): {} (type: {})", status, error_message, error_type);
super::map_provider_http_error(status, "Anthropic")
}
```
- [ ] **Step 4: Build and test**
Run: `cd backend && cargo build && cargo test --lib`
Expected: All pass
- [ ] **Step 5: Commit**
```bash
git add backend/src/services/llm/mod.rs backend/src/services/llm/openai.rs backend/src/services/llm/gemini.rs backend/src/services/llm/anthropic.rs
git commit -m "refactor: extract shared LLM error mapping to reduce duplication"
```
---
### Task 5: Replace `trace_article` parameters with `ArticleTrace` struct
**Files:**
- Modify: `backend/src/services/synthesis.rs`
This task should run AFTER Task 1, since both modify `synthesis.rs`.
- [ ] **Step 1: Define `ArticleTrace` struct**
Add near the top of `synthesis.rs` (in the helper functions section):
```rust
/// Structured parameters for article history tracing.
struct ArticleTrace<'a> {
url: &'a str,
title: &'a str,
source_type: &'a str,
source_url: Option<&'a str>,
category: Option<&'a str>,
synthesis_id: Option<Uuid>,
status: &'a str,
scraped_ok: bool,
}
```
- [ ] **Step 2: Update `trace_article` signature**
Change from 11 parameters to 4:
```rust
async fn trace_article(
pool: &sqlx::PgPool,
user_id: Uuid,
job_id: Uuid,
trace: &ArticleTrace<'_>,
) {
let entry = db::article_history::ArticleHistoryEntry {
user_id,
url: trace.url.to_string(),
url_hash: hash_article_url(trace.url),
title: trace.title.to_string(),
source_type: trace.source_type.to_string(),
source_url: trace.source_url.map(|s| s.to_string()),
category: trace.category.map(|s| s.to_string()),
synthesis_id: trace.synthesis_id,
status: trace.status.to_string(),
scraped_ok: trace.scraped_ok,
job_id,
};
db::article_history::insert_entry(pool, &entry).await.ok();
}
```
- [ ] **Step 3: Update all call sites**
Find every `trace_article(` call in the file. Each one changes from positional args to struct literal. Example:
```rust
// Before:
trace_article(&state.pool, user_id, job_id, &url, "", "personalized_source", Some(&source_url), None, None, "filtered_diversity", false).await;
// After:
trace_article(&state.pool, user_id, job_id, &ArticleTrace {
url: &url, title: "", source_type: "personalized_source",
source_url: Some(&source_url), category: None, synthesis_id: None,
status: "filtered_diversity", scraped_ok: false,
}).await;
```
There are approximately 15-20 call sites. Update all of them. Use `grep -n "trace_article(" backend/src/services/synthesis.rs` to find them all.
- [ ] **Step 4: Build and test**
Run: `cd backend && cargo build && cargo test --lib`
Expected: All pass
- [ ] **Step 5: Commit**
```bash
git add backend/src/services/synthesis.rs
git commit -m "refactor: replace trace_article 11 parameters with ArticleTrace struct"
```