docs: add algorithm rewrite implementation plan (7 tasks)

3 months ago · d3b63295f6
parent 1d5dc0596c
commit d3b63295f6
1 changed files with 688 additions and 0 deletions
--- a/docs/superpowers/plans/2026-03-25-algorithm-rewrite.md
+++ b/docs/superpowers/plans/2026-03-25-algorithm-rewrite.md
@ -0,0 +1,688 @@
 # Algorithm Rewrite — Implementation Plan
 > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
 **Goal:** Rewrite the synthesis generation pipeline: per-article LLM classify/summarize, source rotation, no rewrite pass, remove deprecated settings.
 **Architecture:** Complete rewrite of `synthesis.rs` with a simpler two-phase pipeline. Phase 1: scrape personalized sources sequentially, classify/summarize each article with one LLM call. Phase 2: LLM search for gaps, scrape for validation. No batch classification, no rewrite pass.
 **Tech Stack:** Rust (sqlx, reqwest, scraper), existing LLM providers
 **Spec:** `docs/superpowers/specs/2026-03-25-algorithm-rewrite-design.md`
 **Algorithm:** `docs/algorithm.md`
 ---
 ### Task 1: Migration — drop deprecated settings columns
 **Files:**
 - Create: `backend/migrations/20260325000018_drop_deprecated_settings.sql`
 - Modify: `backend/src/models/settings.rs`
 - Modify: `backend/src/db/settings.rs`
 - Modify: `backend/src/services/prompts.rs` (test fixture)
 - Modify: `CLAUDE.md`
 - [ ] **Step 1: Create migration**
 ```sql
 ALTER TABLE settings DROP COLUMN source_diversity_window;
 ALTER TABLE settings DROP COLUMN use_llm_for_article_extraction;
 ```
 - [ ] **Step 2: Remove from settings model**
 In `models/settings.rs`, remove `source_diversity_window: i32` and `use_llm_for_article_extraction: bool` from `UserSettings`, `SettingsResponse`, `UpdateSettingsRequest`, `From` impl, `Default` impl, and validation.
 - [ ] **Step 3: Remove from DB queries**
 In `db/settings.rs`, remove both fields from `SettingsRow`, `TryFrom`, and both SQL queries (column lists, VALUES, RETURNING, ON CONFLICT SET, .bind() calls). Decrement $N placeholders carefully.
 - [ ] **Step 4: Update test fixtures**
 Remove both fields from `valid_request()` in settings tests and `test_settings()` in prompts tests. Remove any validation tests for these fields.
 - [ ] **Step 5: Update CLAUDE.md migration count to 18**
 - [ ] **Step 6: Verify + commit**
 ```bash
 cd backend && cargo test --lib
 git add backend/migrations/20260325000018_drop_deprecated_settings.sql backend/src/models/settings.rs backend/src/db/settings.rs backend/src/services/prompts.rs CLAUDE.md
 git commit -m "feat: drop source_diversity_window and use_llm_for_article_extraction settings"
 ```
 ---
 ### Task 2: New prompt + schema for per-article classify/summarize
 **Files:**
 - Modify: `backend/src/services/prompts.rs`
 - Modify: `backend/src/services/llm/schema.rs`
 - [ ] **Step 1: Add `build_article_classify_prompt` to prompts.rs**
 ```rust
 /// Build a prompt for per-article classification and summarization.
 ///
 /// The LLM classifies the article into a category and generates a title + summary.
 pub fn build_article_classify_prompt(
    title: &str,
    body_snippet: &str,
    categories: &[String], // includes "Autre"
 ) -> (String, String) {
    let system_prompt =
        "Tu es un assistant qui analyse des articles d'actualite. \
         Tu dois classer l'article dans une categorie et generer un titre et un resume. \
         Reponds uniquement au format JSON demande."
            .to_string();
    let categories_list = categories
        .iter()
        .map(|c| format!("- \"{}\"", c))
        .collect::<Vec<_>>()
        .join("\n");
    let user_prompt = format!(
        "Voici un article d'actualite.\n\n\
         Titre : {title}\n\n\
         Contenu (extrait) :\n{body}\n\n\
         Categories disponibles :\n{categories}\n\n\
         Classe cet article dans la categorie la plus appropriee.\n\
         Si aucune categorie ne correspond, utilise \"Autre\".\n\
         Genere un titre clair et un resume de 4 a 5 lignes.\n\
         Si le titre fourni est vide, genere un titre a partir du contenu.",
        title = if title.is_empty() { "(pas de titre)" } else { title },
        body = body_snippet,
        categories = categories_list,
    );
    (system_prompt, user_prompt)
 }
 ```
 - [ ] **Step 2: Add `build_article_classify_schema` to schema.rs**
 ```rust
 /// Build a JSON Schema for per-article classification and summarization.
 pub fn build_article_classify_schema() -> Value {
    serde_json::json!({
        "type": "object",
        "properties": {
            "title": { "type": "string", "description": "Article title" },
            "summary": { "type": "string", "description": "4-5 line summary of the article" },
            "category": { "type": "string", "description": "Category name from the provided list" }
        },
        "required": ["title", "summary", "category"],
        "additionalProperties": false
    })
 }
 ```
 - [ ] **Step 3: Add tests**
 In prompts.rs tests:
 ```rust
    #[test]
    fn article_classify_prompt_includes_content() {
        let (sys, user) = build_article_classify_prompt("GPT-5 Released", "OpenAI released GPT-5", &["AI News".into(), "Autre".into()]);
        assert!(user.contains("GPT-5 Released"));
        assert!(user.contains("AI News"));
        assert!(user.contains("Autre"));
        assert!(sys.contains("classer"));
    }
    #[test]
    fn article_classify_prompt_handles_empty_title() {
        let (_, user) = build_article_classify_prompt("", "Some content", &["Tech".into(), "Autre".into()]);
        assert!(user.contains("(pas de titre)"));
    }
 ```
 In schema.rs tests:
 ```rust
    #[test]
    fn article_classify_schema_has_all_fields() {
        let schema = build_article_classify_schema();
        let props = schema["properties"].as_object().unwrap();
        assert!(props.contains_key("title"));
        assert!(props.contains_key("summary"));
        assert!(props.contains_key("category"));
        assert_eq!(schema["additionalProperties"], false);
    }
 ```
 - [ ] **Step 4: Verify + commit**
 ```bash
 cd backend && cargo test --lib
 git add backend/src/services/prompts.rs backend/src/services/llm/schema.rs
 git commit -m "feat: add per-article classify/summarize prompt and schema"
 ```
 ---
 ### Task 3: Add `get_last_source_url` to article_history DB + simplify ScrapedContent
 **Files:**
 - Modify: `backend/src/db/article_history.rs`
 - Modify: `backend/src/services/scraper.rs`
 - [ ] **Step 1: Add `get_last_source_url`**
 ```rust
 /// Get the source_url from the most recent 'used' entry for source rotation.
 pub async fn get_last_source_url(
    pool: &PgPool,
    user_id: Uuid,
 ) -> Result<Option<String>, AppError> {
    let result = sqlx::query_scalar::<_, String>(
        "SELECT source_url FROM article_history WHERE user_id = $1 AND status = 'used' AND source_url IS NOT NULL ORDER BY created_at DESC LIMIT 1",
    )
    .bind(user_id)
    .fetch_optional(pool)
    .await?;
    Ok(result)
 }
 ```
 - [ ] **Step 2: Remove `head_html` from `ScrapedContent`**
 In `scraper.rs`, remove `pub head_html: String` from the `ScrapedContent` struct. Remove the `head_html` extraction code in `scrape_url` (the block that finds `<head>...</head>`). Remove `head_html` from the return struct construction.
 This will cause compilation errors in `source_scraper.rs` where `extract_article_links_with_llm` uses `content.head_html` — but source_scraper uses its own `extract_head_and_body` function, not `ScrapedContent.head_html`. Check and fix any references.
 Also check `scrape_single_article_with_llm` in `synthesis.rs` — it references `content.head_html`. This function will be removed in Task 5, but it needs to compile now. Temporarily replace `content.head_html` with `String::new()` if needed, or remove the function now.
 - [ ] **Step 3: Verify + commit**
 ```bash
 cd backend && cargo test --lib
 git add backend/src/db/article_history.rs backend/src/services/scraper.rs backend/src/services/synthesis.rs
 git commit -m "feat: add get_last_source_url + remove head_html from ScrapedContent"
 ```
 ---
 ### Task 4: Remove old prompts, schemas, and unused code
 **Files:**
 - Modify: `backend/src/services/prompts.rs`
 - Modify: `backend/src/services/llm/schema.rs`
 - [ ] **Step 1: Remove old prompts from prompts.rs**
 Remove these functions and their tests:
 - `build_rewrite_prompt`
 - `build_classification_prompt`
 - `build_article_extraction_prompt`
 - `build_link_extraction_prompt` — WAIT, this one stays (used by source_scraper LLM link extraction)
 So remove: `build_rewrite_prompt`, `build_classification_prompt`, `build_article_extraction_prompt` and their tests.
 Also remove the `build_search_prompt` parameter `category_gaps: Option<&[(String, i32)]>` — simplify back to always using `max_items_per_category`. Actually wait — Phase 2 still uses gap-aware search. Keep `category_gaps` parameter.
 Remove `use crate::models::synthesis::ScrapedNewsItem;` if it's no longer needed (check if `build_classification_prompt` was the only user).
 - [ ] **Step 2: Remove old schemas from schema.rs**
 Remove: `build_classification_schema`, `build_article_extraction_schema`
 Keep: `build_category_schema` (Phase 2 search), `build_link_extraction_schema` (source scraper), `build_article_classify_schema` (new)
 - [ ] **Step 3: Verify + commit**
 ```bash
 cd backend && cargo test --lib
 git add backend/src/services/prompts.rs backend/src/services/llm/schema.rs
 git commit -m "refactor: remove old classification, rewrite, and article extraction prompts/schemas"
 ```
 ---
 ### Task 5: Rewrite `synthesis.rs` — the core pipeline
 **Files:**
 - Modify: `backend/src/services/synthesis.rs`
 This is the largest task. The entire `run_generation_inner` function is rewritten. Many helper functions are removed.
 - [ ] **Step 1: Remove dead helper functions**
 Delete these functions and their tests from `synthesis.rs`:
 - `scrape_single_article_with_llm`
 - `scrape_flat_urls`
 - `scrape_articles`
 - `filter_empty_scraped_articles`
 - `build_rewrite_schema`
 - `build_final_sections`
 - `restore_scraped_urls`
 - `parse_classification_response`
 - `limit_articles_per_source`
 - `dedup_by_url`
 - `filter_homepage_urls`
 - `SYNTHESIS_MIN_FILL_RATIO` constant
 - All associated tests for these functions
 Keep:
 - `scrape_single_article` (used for Phase 1 per-article scraping)
 - `emit_progress`
 - `trace_article`
 - `log_llm_call`
 - `normalize_article_url` / `hash_article_url`
 - `extract_domain`
 - `resolve_provider_and_key` / `resolve_model`
 - `check_rate_limit` / `get_user_rate_limiter`
 - `sanitize_json_null_bytes`
 - `sanitize_error_message`
 - `get_iso_week_string`
 - `parse_llm_output` (used in Phase 2)
 - [ ] **Step 2: Add `rotate_sources` helper**
 ```rust
 /// Rotate the sources list so that the source after the last-used source comes first.
 fn rotate_sources(sources: Vec<Source>, last_source_url: Option<&str>) -> Vec<Source> {
    let Some(last_url) = last_source_url else {
        return sources;
    };
    let pos = sources.iter().position(|s| s.url == last_url);
    match pos {
        Some(idx) => {
            let next = (idx + 1) % sources.len();
            let mut rotated = sources[next..].to_vec();
            rotated.extend_from_slice(&sources[..next]);
            rotated
        }
        None => sources, // Last source not in list, don't rotate
    }
 }
 ```
 - [ ] **Step 3: Rewrite `run_generation_inner`**
 Replace the entire function body with the new algorithm. The new flow:
 ```rust
 async fn run_generation_inner(
    job_id: Uuid,
    state: &AppState,
    user_id: Uuid,
    tx: &watch::Sender<ProgressEvent>,
 ) -> Result<Uuid, AppError> {
    // === INITIALIZATION ===
    emit_progress(tx, "settings", "Chargement des parametres...", 5);
    let settings = db::settings::get_or_create_default(&state.pool, user_id).await?;
    // Cleanup
    if settings.article_history_days > 0 {
        db::article_history::cleanup_old(&state.pool, user_id, settings.article_history_days).await.unwrap_or(0);
        db::llm_call_log::truncate_old(&state.pool, user_id, settings.article_history_days).await.ok();
    }
    // Categories — if empty, default to just "Autre"
    let user_categories = if settings.categories.is_empty() {
        Vec::new()
    } else {
        settings.categories.clone()
    };
    let mut classification_categories = user_categories.clone();
    classification_categories.push("Autre".to_string());
    // Load sources
    emit_progress(tx, "sources", "Chargement des sources...", 10);
    let sources = db::sources::list_for_user(&state.pool, user_id).await?;
    // Resolve provider
    emit_progress(tx, "provider", "Configuration du fournisseur IA...", 12);
    let (provider_name, api_key) = resolve_provider_and_key(state, user_id, &settings).await?;
    let provider = create_provider(&provider_name, api_key)?;
    let model_research = if !settings.ai_model.is_empty() { settings.ai_model.clone() } else { resolve_model(state, &provider_name).await? };
    let model_writing = if !settings.ai_model_writing.is_empty() { settings.ai_model_writing.clone() } else { model_research.clone() };
    let user_rate_limiter = get_user_rate_limiter(state, &settings, user_id);
    // Tracking structures
    let mut article_scraped: HashMap<String, Vec<NewsItem>> = HashMap::new();
    let mut source_counts: HashMap<String, usize> = HashMap::new();
    let mut url_source: HashMap<String, String> = HashMap::new(); // url → source_url
    let mut filled_counts: HashMap<String, usize> = HashMap::new();
    let mut seen_urls: std::collections::HashSet<String> = std::collections::HashSet::new();
    let max_total = (user_categories.len() + 1) * settings.max_items_per_category as usize;
    let classify_schema = build_article_classify_schema();
    // === PHASE 1: Personalized Sources ===
    if !sources.is_empty() {
        emit_progress(tx, "sources_scrape", "Analyse des sources personnalisees...", 15);
        // 1a. Rotate sources
        let last_source = db::article_history::get_last_source_url(&state.pool, user_id).await.unwrap_or(None);
        let rotated_sources = rotate_sources(sources.clone(), last_source.as_deref());
        let max_sources = rotated_sources.len().min(10);
        let max_links = 10usize;
        let mut candidate_urls: Vec<(String, String)> = Vec::new(); // (article_url, source_url)
        for source in rotated_sources.iter().take(max_sources) {
            let links = if settings.use_llm_for_source_links {
                source_scraper::extract_article_links_with_llm(
                    &state.http_client, &source.url, max_links, &provider, &model_research,
                ).await
            } else {
                source_scraper::extract_article_links(
                    &state.http_client, &source.url, max_links,
                ).await
            };
            if let Ok(links) = links {
                for link in links {
                    if seen_urls.insert(link.to_lowercase()) {
                        candidate_urls.push((link, source.url.clone()));
                    }
                }
            }
        }
        // Filter against article history
        if settings.article_history_days > 0 && !candidate_urls.is_empty() {
            let hashes: Vec<String> = candidate_urls.iter().map(|(url, _)| hash_article_url(url)).collect();
            let existing = db::article_history::check_urls_exist(&state.pool, user_id, &hashes).await.unwrap_or_default();
            if !existing.is_empty() {
                // Trace filtered articles
                for (url, source_url) in &candidate_urls {
                    if existing.contains(&hash_article_url(url)) {
                        trace_article(&state.pool, user_id, job_id, url, "", "personalized_source", Some(source_url), None, None, "filtered_history", false).await;
                    }
                }
                candidate_urls.retain(|(url, _)| !existing.contains(&hash_article_url(url)));
            }
        }
        // Track url → source
        for (url, source_url) in &candidate_urls {
            url_source.insert(url.clone(), source_url.clone());
        }
        // 1b. Scrape, classify, summarize each article
        emit_progress(tx, "processing", "Traitement des articles...", 25);
        let total_candidates = candidate_urls.len();
        for (idx, (url, source_url)) in candidate_urls.into_iter().enumerate() {
            // Progress
            let pct = 25 + ((idx as u32 * 40) / total_candidates.max(1) as u32).min(40);
            emit_progress(tx, "processing", &format!("Article {}/{}...", idx + 1, total_candidates), pct as u8);
            // Check source limit
            let source_domain = extract_domain(&source_url).unwrap_or_default();
            let source_count = source_counts.get(&source_domain).copied().unwrap_or(0);
            if source_count >= settings.max_articles_per_source as usize {
                trace_article(&state.pool, user_id, job_id, &url, "", "personalized_source", Some(&source_url), None, None, "filtered_diversity", false).await;
                continue;
            }
            // Scrape
            let (body_text, page_title, final_url) = scrape_single_article(&state.http_client, &url, settings.max_age_days as i64).await;
            if body_text.trim().is_empty() {
                trace_article(&state.pool, user_id, job_id, &final_url, &page_title, "personalized_source", Some(&source_url), None, None, "filtered_empty", false).await;
                continue;
            }
            // LLM classify + summarize
            check_rate_limit(state, &user_rate_limiter, &provider_name)?;
            let body_snippet: String = body_text.chars().take(500).collect();
            let (class_sys, class_user) = prompts::build_article_classify_prompt(&page_title, &body_snippet, &classification_categories);
            let llm_start = std::time::Instant::now();
            let class_response = provider.call_llm(&model_research, &class_sys, &class_user, &classify_schema).await?;
            let llm_duration = llm_start.elapsed().as_millis() as u64;
            log_llm_call(&state.pool, user_id, job_id, "classify_summarize", &model_research, &class_sys, &class_user, &class_response, llm_duration).await;
            // Parse response
            let llm_title = class_response.get("title").and_then(|t| t.as_str()).unwrap_or(&page_title).to_string();
            let llm_summary = class_response.get("summary").and_then(|s| s.as_str()).unwrap_or("").to_string();
            let mut llm_category = class_response.get("category").and_then(|c| c.as_str()).unwrap_or("Autre").to_string();
            // Validate category — if not in list, use "Autre"
            if !classification_categories.iter().any(|c| c.to_lowercase() == llm_category.to_lowercase()) {
                llm_category = "Autre".to_string();
            }
            // Map category to key
            let cat_key = if llm_category == "Autre" {
                "category_autre".to_string()
            } else {
                user_categories.iter().position(|c| c.to_lowercase() == llm_category.to_lowercase())
                    .map(|i| format!("category_{}", i))
                    .unwrap_or_else(|| "category_autre".to_string())
            };
            // Check if category is full → overflow to "Autre"
            let cat_filled = filled_counts.get(&llm_category).copied().unwrap_or(0);
            let (final_cat_key, final_cat_name) = if cat_filled >= settings.max_items_per_category as usize && llm_category != "Autre" {
                let autre_filled = filled_counts.get("Autre").copied().unwrap_or(0);
                if autre_filled >= settings.max_items_per_category as usize {
                    // Both full — skip article
                    continue;
                }
                ("category_autre".to_string(), "Autre".to_string())
            } else {
                (cat_key, llm_category)
            };
            // Add article
            article_scraped.entry(final_cat_key).or_default().push(NewsItem {
                title: llm_title,
                url: final_url.clone(),
                summary: llm_summary,
            });
            *filled_counts.entry(final_cat_name).or_insert(0) += 1;
            *source_counts.entry(source_domain).or_insert(0) += 1;
            // Check if we've reached the maximum
            let total: usize = article_scraped.values().map(|v| v.len()).sum();
            if total >= max_total {
                break;
            }
        }
    }
    // === PHASE 2: Web Search Fallback ===
    let category_gaps: Vec<(String, i32)> = user_categories.iter().filter_map(|cat| {
        let filled = filled_counts.get(cat).copied().unwrap_or(0);
        let needed = (settings.max_items_per_category as usize).saturating_sub(filled);
        if needed > 0 { Some((cat.clone(), needed as i32)) } else { None }
    }).collect();
    if !category_gaps.is_empty() {
        emit_progress(tx, "search", "Recherche d'actualites complementaires...", 70);
        check_rate_limit(state, &user_rate_limiter, &provider_name)?;
        let search_schema = build_category_schema(&user_categories, settings.max_items_per_category);
        let current_date = Utc::now().format("%A %d %B %Y").to_string();
        let (sys_prompt, usr_prompt) = prompts::build_search_prompt(&settings, &sources, &current_date, &[], Some(&category_gaps));
        let llm_start = std::time::Instant::now();
        let raw_results = provider.call_llm(&model_research, &sys_prompt, &usr_prompt, &search_schema).await?;
        let llm_duration = llm_start.elapsed().as_millis() as u64;
        log_llm_call(&state.pool, user_id, job_id, "search", &model_research, &sys_prompt, &usr_prompt, &raw_results, llm_duration).await;
        // Parse and filter
        emit_progress(tx, "parsing", "Analyse des resultats...", 75);
        let parsed = parse_llm_output(&raw_results, &user_categories)?;
        // Filter: homepage, cross-phase dedup, url dedup, source limit, history
        let mut phase2_articles: Vec<(String, NewsItem)> = Vec::new(); // (cat_key, item)
        for (cat_key, items) in parsed {
            for item in items {
                let url_lower = item.url.to_lowercase();
                // Homepage filter
                if let Ok(parsed_url) = url::Url::parse(&item.url) {
                    let path = parsed_url.path();
                    if path.is_empty() || path == "/" {
                        trace_article(&state.pool, user_id, job_id, &item.url, &item.title, "web_search", None, None, None, "filtered_homepage", false).await;
                        continue;
                    }
                }
                // Cross-phase dedup
                if seen_urls.contains(&url_lower) {
                    trace_article(&state.pool, user_id, job_id, &item.url, &item.title, "web_search", None, None, None, "filtered_cross_phase_dedup", false).await;
                    continue;
                }
                // History dedup
                if settings.article_history_days > 0 {
                    let hash = hash_article_url(&item.url);
                    let exists = db::article_history::check_urls_exist(&state.pool, user_id, &[hash.clone()]).await.unwrap_or_default();
                    if exists.contains(&hash) {
                        trace_article(&state.pool, user_id, job_id, &item.url, &item.title, "web_search", None, None, None, "filtered_history", false).await;
                        continue;
                    }
                }
                // Source limit
                if let Some(domain) = extract_domain(&item.url) {
                    let count = source_counts.get(&domain).copied().unwrap_or(0);
                    if count >= settings.max_articles_per_source as usize {
                        trace_article(&state.pool, user_id, job_id, &item.url, &item.title, "web_search", None, None, None, "filtered_diversity", false).await;
                        continue;
                    }
                }
                seen_urls.insert(url_lower);
                phase2_articles.push((cat_key.clone(), item));
            }
        }
        // Scrape Phase 2 articles for validation
        emit_progress(tx, "scraping", "Verification des sources web...", 80);
        for (cat_key, item) in phase2_articles {
            let (body_text, _, final_url) = scrape_single_article(&state.http_client, &item.url, settings.max_age_days as i64).await;
            if body_text.trim().is_empty() {
                trace_article(&state.pool, user_id, job_id, &final_url, &item.title, "web_search", None, None, None, "filtered_empty", false).await;
                continue;
            }
            // Use the LLM-provided title and summary (Phase 2 summaries are final)
            article_scraped.entry(cat_key).or_default().push(NewsItem {
                title: item.title,
                url: final_url,
                summary: item.summary,
            });
            if let Some(domain) = extract_domain(&item.url) {
                *source_counts.entry(domain).or_insert(0) += 1;
            }
        }
    }
    // === SAVE ===
    if article_scraped.values().all(|items| items.is_empty()) {
        return Err(AppError::BadRequest("Aucun article valide trouve. Verifiez vos sources et categories.".into()));
    }
    emit_progress(tx, "saving", "Sauvegarde de la synthese...", 90);
    // Build final sections
    let mut final_sections: Vec<NewsSection> = Vec::new();
    for (i, cat_name) in user_categories.iter().enumerate() {
        let key = format!("category_{}", i);
        if let Some(items) = article_scraped.get(&key) {
            if !items.is_empty() {
                final_sections.push(NewsSection { title: cat_name.clone(), items: items.clone() });
            }
        }
    }
    if let Some(autre_items) = article_scraped.get("category_autre") {
        if !autre_items.is_empty() {
            final_sections.push(NewsSection { title: "Autre".to_string(), items: autre_items.clone() });
        }
    }
    let sections_json = serde_json::to_value(&final_sections).map_err(|e| AppError::Internal(anyhow::anyhow!("Failed to serialize: {}", e)))?;
    let sections_json = sanitize_json_null_bytes(sections_json);
    let synthesis = db::syntheses::create(&state.pool, user_id, &get_iso_week_string(Utc::now().date_naive()), &sections_json, job_id).await?;
    // Record used articles
    if settings.article_history_days > 0 {
        for section in &final_sections {
            for item in &section.items {
                let source_url = url_source.get(&item.url).map(|s| s.as_str());
                trace_article(&state.pool, user_id, job_id, &item.url, &item.title,
                    if source_url.is_some() { "personalized_source" } else { "web_search" },
                    source_url, Some(&section.title), Some(synthesis.id), "used", true).await;
            }
        }
    }
    Ok(synthesis.id)
 }
 ```
 - [ ] **Step 4: Add `rotate_sources` unit tests**
 ```rust
    #[test]
    fn rotate_sources_after_last_used() {
        // Create mock sources — need Source struct with url field
        // Test that rotation works correctly
    }
 ```
 - [ ] **Step 5: Verify + commit**
 ```bash
 cd backend && cargo test --lib
 git add backend/src/services/synthesis.rs
 git commit -m "feat: rewrite synthesis pipeline — per-article classify/summarize, no rewrite pass"
 ```
 ---
 ### Task 6: Frontend — remove deprecated settings
 **Files:**
 - Modify: `frontend/src/types.ts`
 - Modify: `frontend/src/pages/Settings.tsx`
 - Modify: `frontend/src/i18n/fr.ts`
 - [ ] **Step 1: Remove fields from types**
 Remove `source_diversity_window: number` and `use_llm_for_article_extraction: boolean` from `UserSettings` and `DEFAULT_SETTINGS`.
 - [ ] **Step 2: Remove from Settings page**
 Remove the diversity window number input and the LLM extraction checkbox from `Settings.tsx`.
 - [ ] **Step 3: Remove i18n labels**
 Remove `settings.diversityWindow` and `settings.useLlmForArticleExtraction` labels.
 - [ ] **Step 4: Verify + commit**
 ```bash
 cd frontend && npx tsc --noEmit && npx vitest run
 git add frontend/src/types.ts frontend/src/pages/Settings.tsx frontend/src/i18n/fr.ts
 git commit -m "feat: remove deprecated settings from frontend"
 ```
 ---
 ### Task 7: Update E2E test
 **Files:**
 - Modify: `e2e/tests/generation-live.spec.ts`
 - [ ] **Step 1: Update settings payload**
 Remove `source_diversity_window` and `use_llm_for_article_extraction` from the PUT settings body.
 - [ ] **Step 2: Commit**
 ```bash
 git add e2e/tests/generation-live.spec.ts
 git commit -m "test: update E2E test for new pipeline (remove deprecated settings)"
 ```