You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
ai_synth/docs/superpowers/plans/2026-03-26-structural-refac...

16 KiB

Structural Refactoring — Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: 5 structural refactoring items from the code audit: decompose synthesis.rs, eliminate SettingsResponse, decompose Settings.tsx, extract shared LLM error mapping, replace trace_article with struct.

Architecture: Pure refactoring — no behavioral changes. Each task is independently committable. Tasks 1 and 5 both modify synthesis.rs so Task 5 should run after Task 1.

Tech Stack: Rust (Axum, sqlx), SolidJS, TypeScript

Spec: docs/superpowers/specs/2026-03-26-structural-refactoring-design.md


Task 1: Extract shared helpers from synthesis.rs

Files:

  • Modify: backend/src/services/synthesis.rs

This is the highest-impact refactoring. The run_generation_inner function (~650 lines) has the scrape+classify batch loop duplicated in Phase 1 (lines ~390-540) and Phase 2 Brave (lines ~610-740), plus filtering logic duplicated in Phase 2 Brave and Phase 2 LLM.

  • Step 1: Read the full file and understand the structure

Read backend/src/services/synthesis.rs entirely. Identify:

  • Phase 1 batch loop (starts around "1b. Scrape, classify, summarize in batches")

  • Phase 2 Brave batch loop (inside if settings.use_brave_search)

  • Phase 2 LLM filtering (inside the else branch)

  • The category assignment logic (appears after each classify result)

  • The trace_article calls (will be refactored in Task 5, leave as-is for now)

  • Step 2: Extract assign_category helper

This logic is duplicated identically in Phase 1 (~lines 497-530) and Phase 2 Brave (~lines 700-740). Extract it into a private helper:

/// Assign an article to a category based on LLM classification response.
///
/// Returns `(category_key, category_name)` — e.g. `("category_0", "AI News")`.
/// Handles overflow to "Autre" when the target category is full.
/// Returns `None` if both the target category and "Autre" are full (article should be skipped).
fn assign_category(
    llm_response: &serde_json::Value,
    page_title: &str,
    user_categories: &[String],
    classification_categories: &[String],
    filled_counts: &HashMap<String, usize>,
    max_items_per_category: usize,
) -> Option<(String, String, String, String)> {
    // Returns (cat_key, cat_name, llm_title, llm_summary)
    let llm_title = llm_response.get("title").and_then(|t| t.as_str()).unwrap_or(page_title).to_string();
    let llm_summary = llm_response.get("summary").and_then(|s| s.as_str()).unwrap_or("").to_string();
    let mut llm_category = llm_response.get("category").and_then(|c| c.as_str()).unwrap_or("Autre").to_string();

    if !classification_categories.iter().any(|c| c.to_lowercase() == llm_category.to_lowercase()) {
        llm_category = "Autre".to_string();
    }

    let cat_key = if llm_category.to_lowercase() == "autre" {
        "category_autre".to_string()
    } else {
        user_categories.iter().position(|c| c.to_lowercase() == llm_category.to_lowercase())
            .map(|i| format!("category_{}", i))
            .unwrap_or_else(|| "category_autre".to_string())
    };

    let cat_filled = filled_counts.get(&llm_category).copied().unwrap_or(0);
    if cat_filled >= max_items_per_category && llm_category.to_lowercase() != "autre" {
        let autre_filled = filled_counts.get("Autre").copied().unwrap_or(0);
        if autre_filled >= max_items_per_category {
            return None; // Skip article
        }
        Some(("category_autre".to_string(), "Autre".to_string(), llm_title, llm_summary))
    } else {
        Some((cat_key, llm_category, llm_title, llm_summary))
    }
}

Replace both Phase 1 and Phase 2 Brave category assignment blocks with calls to this helper.

  • Step 3: Extract filter_phase2_url helper

The filtering logic (homepage, cross-phase dedup, history dedup, source diversity) is duplicated between Phase 2 Brave (~lines 564-595) and Phase 2 LLM (~lines 580-616). Extract:

/// Check if a Phase 2 URL passes all filters.
/// Returns the filter reason if rejected, None if accepted.
async fn filter_phase2_url(
    pool: &sqlx::PgPool,
    user_id: Uuid,
    url: &str,
    seen_urls: &std::collections::HashSet<String>,
    source_counts: &HashMap<String, usize>,
    article_history_days: i32,
    max_articles_per_source: usize,
) -> Option<&'static str> {
    // Homepage filter
    if let Ok(parsed_url) = url::Url::parse(url) {
        let path = parsed_url.path();
        if path.is_empty() || path == "/" {
            return Some("filtered_homepage");
        }
    }

    // Cross-phase dedup
    if seen_urls.contains(&url.to_lowercase()) {
        return Some("filtered_cross_phase_dedup");
    }

    // History dedup
    if article_history_days > 0 {
        let hash = hash_article_url(url);
        let exists = db::article_history::check_urls_exist(pool, user_id, std::slice::from_ref(&hash)).await.unwrap_or_default();
        if exists.contains(&hash) {
            return Some("filtered_history");
        }
    }

    // Source diversity
    if let Some(domain) = extract_domain(url) {
        let count = source_counts.get(&domain).copied().unwrap_or(0);
        if count >= max_articles_per_source {
            return Some("filtered_diversity");
        }
    }

    None // Accepted
}

Replace both Phase 2 Brave and Phase 2 LLM inline filtering with calls to this helper. Each call site still handles trace_article with its own source_type.

  • Step 4: Build and test

Run: cd backend && cargo build && cargo test --lib Expected: All 369 tests pass, no behavioral change

  • Step 5: Commit
git add backend/src/services/synthesis.rs
git commit -m "refactor: extract assign_category and filter_phase2_url helpers from synthesis pipeline"

Task 2: Eliminate SettingsResponse struct

Files:

  • Modify: backend/src/models/settings.rs

  • Modify: backend/src/handlers/settings.rs

  • Step 1: Add #[serde(skip_serializing)] to UserSettings

In backend/src/models/settings.rs, add #[serde(skip_serializing)] to the user_id and updated_at fields of UserSettings:

#[derive(Debug, Clone, Serialize)]
pub struct UserSettings {
    #[serde(skip_serializing)]
    pub user_id: Uuid,
    pub theme: String,
    // ... all other fields unchanged ...
    #[serde(skip_serializing)]
    pub updated_at: DateTime<Utc>,
}
  • Step 2: Delete SettingsResponse and its From impl

Delete the entire SettingsResponse struct (lines ~29-46) and the impl From<UserSettings> for SettingsResponse block (lines ~48-66).

  • Step 3: Update handlers

In backend/src/handlers/settings.rs:

  • Remove use crate::models::settings::SettingsResponse (or the path it's imported from — check the actual import)

  • In get_settings: change Ok(Json(SettingsResponse::from(settings))) to Ok(Json(settings))

  • In update_settings: change Ok(Json(SettingsResponse::from(settings))) to Ok(Json(settings))

  • Step 4: Check for other usages

Run: cd backend && grep -r "SettingsResponse" src/ — should return no results.

  • Step 5: Build and test

Run: cd backend && cargo build && cargo test --lib Expected: All pass

  • Step 6: Commit
git add backend/src/models/settings.rs backend/src/handlers/settings.rs
git commit -m "refactor: eliminate SettingsResponse struct, serialize UserSettings directly"

Task 3: Decompose Settings.tsx

Files:

  • Create: frontend/src/components/settings/SettingsBraveSearch.tsx

  • Create: frontend/src/components/settings/SettingsRateLimit.tsx

  • Create: frontend/src/components/settings/SettingsAdvanced.tsx

  • Modify: frontend/src/pages/Settings.tsx

  • Step 1: Read Settings.tsx and identify section boundaries

Read frontend/src/pages/Settings.tsx entirely. Identify:

  • Brave Search section (~lines 572-670) — key management + toggle

  • Rate Limit section (~lines 908-980) — two number inputs + effective rate display + reset

  • Advanced extraction section (~lines 546-570) — checkbox + history days + batch size + search behavior

  • Step 2: Create SettingsBraveSearch.tsx

Create frontend/src/components/settings/SettingsBraveSearch.tsx. Extract the Brave Search section into a component that receives:

interface SettingsBraveSearchProps {
  settings: () => UserSettings;
  setSettings: SetStoreFunction<UserSettings>; // or whatever the setter type is
  onKeyChanged?: () => void; // to refetch api keys in parent
}

Move the Brave-specific signals (braveKeyInput, braveSaving, braveTesting), the braveKey() derived accessor, and the handler functions (handleBraveKeySave, handleBraveKeyTest, handleBraveKeyDelete) into this component. The component loads its own API keys via apiKeysApi.list().

  • Step 3: Create SettingsRateLimit.tsx

Create frontend/src/components/settings/SettingsRateLimit.tsx. Extract the rate limit section. Props:

interface SettingsRateLimitProps {
  settings: () => UserSettings;
  setSettings: SetStoreFunction<UserSettings>;
}
  • Step 4: Create SettingsAdvanced.tsx

Create frontend/src/components/settings/SettingsAdvanced.tsx. Extract the advanced extraction section (checkbox, history days, batch size, search behavior textarea). Props same pattern.

  • Step 5: Update Settings.tsx to use sub-components

Replace the inline sections with component imports:

import SettingsBraveSearch from '~/components/settings/SettingsBraveSearch';
import SettingsRateLimit from '~/components/settings/SettingsRateLimit';
import SettingsAdvanced from '~/components/settings/SettingsAdvanced';

The parent keeps: general settings (theme, categories, max_age/items/articles), provider/model selection, API key manager, and the save button.

  • Step 6: TypeScript check

Run: cd frontend && npx tsc --noEmit Expected: No errors

  • Step 7: Commit
git add frontend/src/components/settings/ frontend/src/pages/Settings.tsx
git commit -m "refactor: decompose Settings.tsx into sub-components"

Task 4: Extract shared LLM error mapping

Files:

  • Modify: backend/src/services/llm/mod.rs

  • Modify: backend/src/services/llm/openai.rs

  • Modify: backend/src/services/llm/gemini.rs

  • Modify: backend/src/services/llm/anthropic.rs

  • Step 1: Read all three error mapping functions

Read the map_*_error functions in all three provider files. Note the differences:

  • OpenAI: extracts error.message + error.type, handles 400/401/403/404/429

  • Gemini: extracts error.message + error.status, merges 401+403, handles 400/401|403/404/429

  • Anthropic: extracts error.message + error.type, handles 400/401/403/404/429/529

  • Step 2: Add shared mapper in mod.rs

In backend/src/services/llm/mod.rs, add:

/// Shared HTTP error mapping for LLM provider responses.
///
/// Maps common HTTP status codes to `AppError` variants.
/// Provider-specific logging should happen before calling this.
pub fn map_provider_http_error(status: u16, provider_name: &str) -> AppError {
    match status {
        400 => AppError::BadRequest("Invalid request to LLM provider".into()),
        401 => AppError::BadRequest("Invalid or unauthorized API key".into()),
        403 => AppError::BadRequest("Access denied by LLM provider".into()),
        404 => AppError::BadRequest("Model not found or not available".into()),
        429 | 529 => AppError::RateLimited(
            "LLM provider rate limit exceeded. Please try again later.".into(),
        ),
        _ => AppError::Internal(anyhow::anyhow!(
            "{} returned HTTP {}", provider_name, status
        )),
    }
}

Note: 529 (Anthropic overloaded) is included in the shared mapper as it's semantically equivalent to 429 for any provider.

  • Step 3: Replace each provider's error mapper

In each provider file, replace the map_*_error function with a thinner version that logs provider-specific details, then delegates to the shared mapper:

OpenAI:

fn map_openai_error(status: u16, body: &Value) -> AppError {
    let error_message = body.get("error").and_then(|e| e.get("message")).and_then(|m| m.as_str()).unwrap_or("Unknown error");
    let error_type = body.get("error").and_then(|e| e.get("type")).and_then(|t| t.as_str()).unwrap_or("");
    tracing::error!("OpenAI API error (HTTP {}): {} (type: {})", status, error_message, error_type);
    super::map_provider_http_error(status, "OpenAI")
}

Gemini:

fn map_gemini_error(status: u16, body: &Value) -> AppError {
    let error_message = body.get("error").and_then(|e| e.get("message")).and_then(|m| m.as_str()).unwrap_or("Unknown error");
    let error_status = body.get("error").and_then(|e| e.get("status")).and_then(|s| s.as_str()).unwrap_or("");
    tracing::error!("Gemini API error (HTTP {}): {} (status: {})", status, error_message, error_status);
    super::map_provider_http_error(status, "Gemini")
}

Anthropic:

fn map_anthropic_error(status: u16, body: &Value) -> AppError {
    let error_message = body.get("error").and_then(|e| e.get("message")).and_then(|m| m.as_str()).unwrap_or("Unknown error");
    let error_type = body.get("error").and_then(|e| e.get("type")).and_then(|t| t.as_str()).unwrap_or("");
    tracing::error!("Anthropic API error (HTTP {}): {} (type: {})", status, error_message, error_type);
    super::map_provider_http_error(status, "Anthropic")
}
  • Step 4: Build and test

Run: cd backend && cargo build && cargo test --lib Expected: All pass

  • Step 5: Commit
git add backend/src/services/llm/mod.rs backend/src/services/llm/openai.rs backend/src/services/llm/gemini.rs backend/src/services/llm/anthropic.rs
git commit -m "refactor: extract shared LLM error mapping to reduce duplication"

Task 5: Replace trace_article parameters with ArticleTrace struct

Files:

  • Modify: backend/src/services/synthesis.rs

This task should run AFTER Task 1, since both modify synthesis.rs.

  • Step 1: Define ArticleTrace struct

Add near the top of synthesis.rs (in the helper functions section):

/// Structured parameters for article history tracing.
struct ArticleTrace<'a> {
    url: &'a str,
    title: &'a str,
    source_type: &'a str,
    source_url: Option<&'a str>,
    category: Option<&'a str>,
    synthesis_id: Option<Uuid>,
    status: &'a str,
    scraped_ok: bool,
}
  • Step 2: Update trace_article signature

Change from 11 parameters to 4:

async fn trace_article(
    pool: &sqlx::PgPool,
    user_id: Uuid,
    job_id: Uuid,
    trace: &ArticleTrace<'_>,
) {
    let entry = db::article_history::ArticleHistoryEntry {
        user_id,
        url: trace.url.to_string(),
        url_hash: hash_article_url(trace.url),
        title: trace.title.to_string(),
        source_type: trace.source_type.to_string(),
        source_url: trace.source_url.map(|s| s.to_string()),
        category: trace.category.map(|s| s.to_string()),
        synthesis_id: trace.synthesis_id,
        status: trace.status.to_string(),
        scraped_ok: trace.scraped_ok,
        job_id,
    };
    db::article_history::insert_entry(pool, &entry).await.ok();
}
  • Step 3: Update all call sites

Find every trace_article( call in the file. Each one changes from positional args to struct literal. Example:

// Before:
trace_article(&state.pool, user_id, job_id, &url, "", "personalized_source", Some(&source_url), None, None, "filtered_diversity", false).await;

// After:
trace_article(&state.pool, user_id, job_id, &ArticleTrace {
    url: &url, title: "", source_type: "personalized_source",
    source_url: Some(&source_url), category: None, synthesis_id: None,
    status: "filtered_diversity", scraped_ok: false,
}).await;

There are approximately 15-20 call sites. Update all of them. Use grep -n "trace_article(" backend/src/services/synthesis.rs to find them all.

  • Step 4: Build and test

Run: cd backend && cargo build && cargo test --lib Expected: All pass

  • Step 5: Commit
git add backend/src/services/synthesis.rs
git commit -m "refactor: replace trace_article 11 parameters with ArticleTrace struct"