diff --git a/docs/superpowers/plans/2026-03-25-algorithm-rewrite.md b/docs/superpowers/plans/2026-03-25-algorithm-rewrite.md new file mode 100644 index 0000000..82f7aab --- /dev/null +++ b/docs/superpowers/plans/2026-03-25-algorithm-rewrite.md @@ -0,0 +1,688 @@ +# Algorithm Rewrite — Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Rewrite the synthesis generation pipeline: per-article LLM classify/summarize, source rotation, no rewrite pass, remove deprecated settings. + +**Architecture:** Complete rewrite of `synthesis.rs` with a simpler two-phase pipeline. Phase 1: scrape personalized sources sequentially, classify/summarize each article with one LLM call. Phase 2: LLM search for gaps, scrape for validation. No batch classification, no rewrite pass. + +**Tech Stack:** Rust (sqlx, reqwest, scraper), existing LLM providers + +**Spec:** `docs/superpowers/specs/2026-03-25-algorithm-rewrite-design.md` +**Algorithm:** `docs/algorithm.md` + +--- + +### Task 1: Migration — drop deprecated settings columns + +**Files:** +- Create: `backend/migrations/20260325000018_drop_deprecated_settings.sql` +- Modify: `backend/src/models/settings.rs` +- Modify: `backend/src/db/settings.rs` +- Modify: `backend/src/services/prompts.rs` (test fixture) +- Modify: `CLAUDE.md` + +- [ ] **Step 1: Create migration** + +```sql +ALTER TABLE settings DROP COLUMN source_diversity_window; +ALTER TABLE settings DROP COLUMN use_llm_for_article_extraction; +``` + +- [ ] **Step 2: Remove from settings model** + +In `models/settings.rs`, remove `source_diversity_window: i32` and `use_llm_for_article_extraction: bool` from `UserSettings`, `SettingsResponse`, `UpdateSettingsRequest`, `From` impl, `Default` impl, and validation. + +- [ ] **Step 3: Remove from DB queries** + +In `db/settings.rs`, remove both fields from `SettingsRow`, `TryFrom`, and both SQL queries (column lists, VALUES, RETURNING, ON CONFLICT SET, .bind() calls). Decrement $N placeholders carefully. + +- [ ] **Step 4: Update test fixtures** + +Remove both fields from `valid_request()` in settings tests and `test_settings()` in prompts tests. Remove any validation tests for these fields. + +- [ ] **Step 5: Update CLAUDE.md migration count to 18** + +- [ ] **Step 6: Verify + commit** + +```bash +cd backend && cargo test --lib +git add backend/migrations/20260325000018_drop_deprecated_settings.sql backend/src/models/settings.rs backend/src/db/settings.rs backend/src/services/prompts.rs CLAUDE.md +git commit -m "feat: drop source_diversity_window and use_llm_for_article_extraction settings" +``` + +--- + +### Task 2: New prompt + schema for per-article classify/summarize + +**Files:** +- Modify: `backend/src/services/prompts.rs` +- Modify: `backend/src/services/llm/schema.rs` + +- [ ] **Step 1: Add `build_article_classify_prompt` to prompts.rs** + +```rust +/// Build a prompt for per-article classification and summarization. +/// +/// The LLM classifies the article into a category and generates a title + summary. +pub fn build_article_classify_prompt( + title: &str, + body_snippet: &str, + categories: &[String], // includes "Autre" +) -> (String, String) { + let system_prompt = + "Tu es un assistant qui analyse des articles d'actualite. \ + Tu dois classer l'article dans une categorie et generer un titre et un resume. \ + Reponds uniquement au format JSON demande." + .to_string(); + + let categories_list = categories + .iter() + .map(|c| format!("- \"{}\"", c)) + .collect::>() + .join("\n"); + + let user_prompt = format!( + "Voici un article d'actualite.\n\n\ + Titre : {title}\n\n\ + Contenu (extrait) :\n{body}\n\n\ + Categories disponibles :\n{categories}\n\n\ + Classe cet article dans la categorie la plus appropriee.\n\ + Si aucune categorie ne correspond, utilise \"Autre\".\n\ + Genere un titre clair et un resume de 4 a 5 lignes.\n\ + Si le titre fourni est vide, genere un titre a partir du contenu.", + title = if title.is_empty() { "(pas de titre)" } else { title }, + body = body_snippet, + categories = categories_list, + ); + + (system_prompt, user_prompt) +} +``` + +- [ ] **Step 2: Add `build_article_classify_schema` to schema.rs** + +```rust +/// Build a JSON Schema for per-article classification and summarization. +pub fn build_article_classify_schema() -> Value { + serde_json::json!({ + "type": "object", + "properties": { + "title": { "type": "string", "description": "Article title" }, + "summary": { "type": "string", "description": "4-5 line summary of the article" }, + "category": { "type": "string", "description": "Category name from the provided list" } + }, + "required": ["title", "summary", "category"], + "additionalProperties": false + }) +} +``` + +- [ ] **Step 3: Add tests** + +In prompts.rs tests: +```rust + #[test] + fn article_classify_prompt_includes_content() { + let (sys, user) = build_article_classify_prompt("GPT-5 Released", "OpenAI released GPT-5", &["AI News".into(), "Autre".into()]); + assert!(user.contains("GPT-5 Released")); + assert!(user.contains("AI News")); + assert!(user.contains("Autre")); + assert!(sys.contains("classer")); + } + + #[test] + fn article_classify_prompt_handles_empty_title() { + let (_, user) = build_article_classify_prompt("", "Some content", &["Tech".into(), "Autre".into()]); + assert!(user.contains("(pas de titre)")); + } +``` + +In schema.rs tests: +```rust + #[test] + fn article_classify_schema_has_all_fields() { + let schema = build_article_classify_schema(); + let props = schema["properties"].as_object().unwrap(); + assert!(props.contains_key("title")); + assert!(props.contains_key("summary")); + assert!(props.contains_key("category")); + assert_eq!(schema["additionalProperties"], false); + } +``` + +- [ ] **Step 4: Verify + commit** + +```bash +cd backend && cargo test --lib +git add backend/src/services/prompts.rs backend/src/services/llm/schema.rs +git commit -m "feat: add per-article classify/summarize prompt and schema" +``` + +--- + +### Task 3: Add `get_last_source_url` to article_history DB + simplify ScrapedContent + +**Files:** +- Modify: `backend/src/db/article_history.rs` +- Modify: `backend/src/services/scraper.rs` + +- [ ] **Step 1: Add `get_last_source_url`** + +```rust +/// Get the source_url from the most recent 'used' entry for source rotation. +pub async fn get_last_source_url( + pool: &PgPool, + user_id: Uuid, +) -> Result, AppError> { + let result = sqlx::query_scalar::<_, String>( + "SELECT source_url FROM article_history WHERE user_id = $1 AND status = 'used' AND source_url IS NOT NULL ORDER BY created_at DESC LIMIT 1", + ) + .bind(user_id) + .fetch_optional(pool) + .await?; + Ok(result) +} +``` + +- [ ] **Step 2: Remove `head_html` from `ScrapedContent`** + +In `scraper.rs`, remove `pub head_html: String` from the `ScrapedContent` struct. Remove the `head_html` extraction code in `scrape_url` (the block that finds `...`). Remove `head_html` from the return struct construction. + +This will cause compilation errors in `source_scraper.rs` where `extract_article_links_with_llm` uses `content.head_html` — but source_scraper uses its own `extract_head_and_body` function, not `ScrapedContent.head_html`. Check and fix any references. + +Also check `scrape_single_article_with_llm` in `synthesis.rs` — it references `content.head_html`. This function will be removed in Task 5, but it needs to compile now. Temporarily replace `content.head_html` with `String::new()` if needed, or remove the function now. + +- [ ] **Step 3: Verify + commit** + +```bash +cd backend && cargo test --lib +git add backend/src/db/article_history.rs backend/src/services/scraper.rs backend/src/services/synthesis.rs +git commit -m "feat: add get_last_source_url + remove head_html from ScrapedContent" +``` + +--- + +### Task 4: Remove old prompts, schemas, and unused code + +**Files:** +- Modify: `backend/src/services/prompts.rs` +- Modify: `backend/src/services/llm/schema.rs` + +- [ ] **Step 1: Remove old prompts from prompts.rs** + +Remove these functions and their tests: +- `build_rewrite_prompt` +- `build_classification_prompt` +- `build_article_extraction_prompt` +- `build_link_extraction_prompt` — WAIT, this one stays (used by source_scraper LLM link extraction) + +So remove: `build_rewrite_prompt`, `build_classification_prompt`, `build_article_extraction_prompt` and their tests. + +Also remove the `build_search_prompt` parameter `category_gaps: Option<&[(String, i32)]>` — simplify back to always using `max_items_per_category`. Actually wait — Phase 2 still uses gap-aware search. Keep `category_gaps` parameter. + +Remove `use crate::models::synthesis::ScrapedNewsItem;` if it's no longer needed (check if `build_classification_prompt` was the only user). + +- [ ] **Step 2: Remove old schemas from schema.rs** + +Remove: `build_classification_schema`, `build_article_extraction_schema` +Keep: `build_category_schema` (Phase 2 search), `build_link_extraction_schema` (source scraper), `build_article_classify_schema` (new) + +- [ ] **Step 3: Verify + commit** + +```bash +cd backend && cargo test --lib +git add backend/src/services/prompts.rs backend/src/services/llm/schema.rs +git commit -m "refactor: remove old classification, rewrite, and article extraction prompts/schemas" +``` + +--- + +### Task 5: Rewrite `synthesis.rs` — the core pipeline + +**Files:** +- Modify: `backend/src/services/synthesis.rs` + +This is the largest task. The entire `run_generation_inner` function is rewritten. Many helper functions are removed. + +- [ ] **Step 1: Remove dead helper functions** + +Delete these functions and their tests from `synthesis.rs`: +- `scrape_single_article_with_llm` +- `scrape_flat_urls` +- `scrape_articles` +- `filter_empty_scraped_articles` +- `build_rewrite_schema` +- `build_final_sections` +- `restore_scraped_urls` +- `parse_classification_response` +- `limit_articles_per_source` +- `dedup_by_url` +- `filter_homepage_urls` +- `SYNTHESIS_MIN_FILL_RATIO` constant +- All associated tests for these functions + +Keep: +- `scrape_single_article` (used for Phase 1 per-article scraping) +- `emit_progress` +- `trace_article` +- `log_llm_call` +- `normalize_article_url` / `hash_article_url` +- `extract_domain` +- `resolve_provider_and_key` / `resolve_model` +- `check_rate_limit` / `get_user_rate_limiter` +- `sanitize_json_null_bytes` +- `sanitize_error_message` +- `get_iso_week_string` +- `parse_llm_output` (used in Phase 2) + +- [ ] **Step 2: Add `rotate_sources` helper** + +```rust +/// Rotate the sources list so that the source after the last-used source comes first. +fn rotate_sources(sources: Vec, last_source_url: Option<&str>) -> Vec { + let Some(last_url) = last_source_url else { + return sources; + }; + + let pos = sources.iter().position(|s| s.url == last_url); + match pos { + Some(idx) => { + let next = (idx + 1) % sources.len(); + let mut rotated = sources[next..].to_vec(); + rotated.extend_from_slice(&sources[..next]); + rotated + } + None => sources, // Last source not in list, don't rotate + } +} +``` + +- [ ] **Step 3: Rewrite `run_generation_inner`** + +Replace the entire function body with the new algorithm. The new flow: + +```rust +async fn run_generation_inner( + job_id: Uuid, + state: &AppState, + user_id: Uuid, + tx: &watch::Sender, +) -> Result { + // === INITIALIZATION === + emit_progress(tx, "settings", "Chargement des parametres...", 5); + let settings = db::settings::get_or_create_default(&state.pool, user_id).await?; + + // Cleanup + if settings.article_history_days > 0 { + db::article_history::cleanup_old(&state.pool, user_id, settings.article_history_days).await.unwrap_or(0); + db::llm_call_log::truncate_old(&state.pool, user_id, settings.article_history_days).await.ok(); + } + + // Categories — if empty, default to just "Autre" + let user_categories = if settings.categories.is_empty() { + Vec::new() + } else { + settings.categories.clone() + }; + let mut classification_categories = user_categories.clone(); + classification_categories.push("Autre".to_string()); + + // Load sources + emit_progress(tx, "sources", "Chargement des sources...", 10); + let sources = db::sources::list_for_user(&state.pool, user_id).await?; + + // Resolve provider + emit_progress(tx, "provider", "Configuration du fournisseur IA...", 12); + let (provider_name, api_key) = resolve_provider_and_key(state, user_id, &settings).await?; + let provider = create_provider(&provider_name, api_key)?; + let model_research = if !settings.ai_model.is_empty() { settings.ai_model.clone() } else { resolve_model(state, &provider_name).await? }; + let model_writing = if !settings.ai_model_writing.is_empty() { settings.ai_model_writing.clone() } else { model_research.clone() }; + let user_rate_limiter = get_user_rate_limiter(state, &settings, user_id); + + // Tracking structures + let mut article_scraped: HashMap> = HashMap::new(); + let mut source_counts: HashMap = HashMap::new(); + let mut url_source: HashMap = HashMap::new(); // url → source_url + let mut filled_counts: HashMap = HashMap::new(); + let mut seen_urls: std::collections::HashSet = std::collections::HashSet::new(); + let max_total = (user_categories.len() + 1) * settings.max_items_per_category as usize; + let classify_schema = build_article_classify_schema(); + + // === PHASE 1: Personalized Sources === + if !sources.is_empty() { + emit_progress(tx, "sources_scrape", "Analyse des sources personnalisees...", 15); + + // 1a. Rotate sources + let last_source = db::article_history::get_last_source_url(&state.pool, user_id).await.unwrap_or(None); + let rotated_sources = rotate_sources(sources.clone(), last_source.as_deref()); + let max_sources = rotated_sources.len().min(10); + let max_links = 10usize; + + let mut candidate_urls: Vec<(String, String)> = Vec::new(); // (article_url, source_url) + + for source in rotated_sources.iter().take(max_sources) { + let links = if settings.use_llm_for_source_links { + source_scraper::extract_article_links_with_llm( + &state.http_client, &source.url, max_links, &provider, &model_research, + ).await + } else { + source_scraper::extract_article_links( + &state.http_client, &source.url, max_links, + ).await + }; + + if let Ok(links) = links { + for link in links { + if seen_urls.insert(link.to_lowercase()) { + candidate_urls.push((link, source.url.clone())); + } + } + } + } + + // Filter against article history + if settings.article_history_days > 0 && !candidate_urls.is_empty() { + let hashes: Vec = candidate_urls.iter().map(|(url, _)| hash_article_url(url)).collect(); + let existing = db::article_history::check_urls_exist(&state.pool, user_id, &hashes).await.unwrap_or_default(); + if !existing.is_empty() { + // Trace filtered articles + for (url, source_url) in &candidate_urls { + if existing.contains(&hash_article_url(url)) { + trace_article(&state.pool, user_id, job_id, url, "", "personalized_source", Some(source_url), None, None, "filtered_history", false).await; + } + } + candidate_urls.retain(|(url, _)| !existing.contains(&hash_article_url(url))); + } + } + + // Track url → source + for (url, source_url) in &candidate_urls { + url_source.insert(url.clone(), source_url.clone()); + } + + // 1b. Scrape, classify, summarize each article + emit_progress(tx, "processing", "Traitement des articles...", 25); + let total_candidates = candidate_urls.len(); + + for (idx, (url, source_url)) in candidate_urls.into_iter().enumerate() { + // Progress + let pct = 25 + ((idx as u32 * 40) / total_candidates.max(1) as u32).min(40); + emit_progress(tx, "processing", &format!("Article {}/{}...", idx + 1, total_candidates), pct as u8); + + // Check source limit + let source_domain = extract_domain(&source_url).unwrap_or_default(); + let source_count = source_counts.get(&source_domain).copied().unwrap_or(0); + if source_count >= settings.max_articles_per_source as usize { + trace_article(&state.pool, user_id, job_id, &url, "", "personalized_source", Some(&source_url), None, None, "filtered_diversity", false).await; + continue; + } + + // Scrape + let (body_text, page_title, final_url) = scrape_single_article(&state.http_client, &url, settings.max_age_days as i64).await; + + if body_text.trim().is_empty() { + trace_article(&state.pool, user_id, job_id, &final_url, &page_title, "personalized_source", Some(&source_url), None, None, "filtered_empty", false).await; + continue; + } + + // LLM classify + summarize + check_rate_limit(state, &user_rate_limiter, &provider_name)?; + let body_snippet: String = body_text.chars().take(500).collect(); + let (class_sys, class_user) = prompts::build_article_classify_prompt(&page_title, &body_snippet, &classification_categories); + + let llm_start = std::time::Instant::now(); + let class_response = provider.call_llm(&model_research, &class_sys, &class_user, &classify_schema).await?; + let llm_duration = llm_start.elapsed().as_millis() as u64; + log_llm_call(&state.pool, user_id, job_id, "classify_summarize", &model_research, &class_sys, &class_user, &class_response, llm_duration).await; + + // Parse response + let llm_title = class_response.get("title").and_then(|t| t.as_str()).unwrap_or(&page_title).to_string(); + let llm_summary = class_response.get("summary").and_then(|s| s.as_str()).unwrap_or("").to_string(); + let mut llm_category = class_response.get("category").and_then(|c| c.as_str()).unwrap_or("Autre").to_string(); + + // Validate category — if not in list, use "Autre" + if !classification_categories.iter().any(|c| c.to_lowercase() == llm_category.to_lowercase()) { + llm_category = "Autre".to_string(); + } + + // Map category to key + let cat_key = if llm_category == "Autre" { + "category_autre".to_string() + } else { + user_categories.iter().position(|c| c.to_lowercase() == llm_category.to_lowercase()) + .map(|i| format!("category_{}", i)) + .unwrap_or_else(|| "category_autre".to_string()) + }; + + // Check if category is full → overflow to "Autre" + let cat_filled = filled_counts.get(&llm_category).copied().unwrap_or(0); + let (final_cat_key, final_cat_name) = if cat_filled >= settings.max_items_per_category as usize && llm_category != "Autre" { + let autre_filled = filled_counts.get("Autre").copied().unwrap_or(0); + if autre_filled >= settings.max_items_per_category as usize { + // Both full — skip article + continue; + } + ("category_autre".to_string(), "Autre".to_string()) + } else { + (cat_key, llm_category) + }; + + // Add article + article_scraped.entry(final_cat_key).or_default().push(NewsItem { + title: llm_title, + url: final_url.clone(), + summary: llm_summary, + }); + *filled_counts.entry(final_cat_name).or_insert(0) += 1; + *source_counts.entry(source_domain).or_insert(0) += 1; + + // Check if we've reached the maximum + let total: usize = article_scraped.values().map(|v| v.len()).sum(); + if total >= max_total { + break; + } + } + } + + // === PHASE 2: Web Search Fallback === + let category_gaps: Vec<(String, i32)> = user_categories.iter().filter_map(|cat| { + let filled = filled_counts.get(cat).copied().unwrap_or(0); + let needed = (settings.max_items_per_category as usize).saturating_sub(filled); + if needed > 0 { Some((cat.clone(), needed as i32)) } else { None } + }).collect(); + + if !category_gaps.is_empty() { + emit_progress(tx, "search", "Recherche d'actualites complementaires...", 70); + check_rate_limit(state, &user_rate_limiter, &provider_name)?; + + let search_schema = build_category_schema(&user_categories, settings.max_items_per_category); + let current_date = Utc::now().format("%A %d %B %Y").to_string(); + let (sys_prompt, usr_prompt) = prompts::build_search_prompt(&settings, &sources, ¤t_date, &[], Some(&category_gaps)); + + let llm_start = std::time::Instant::now(); + let raw_results = provider.call_llm(&model_research, &sys_prompt, &usr_prompt, &search_schema).await?; + let llm_duration = llm_start.elapsed().as_millis() as u64; + log_llm_call(&state.pool, user_id, job_id, "search", &model_research, &sys_prompt, &usr_prompt, &raw_results, llm_duration).await; + + // Parse and filter + emit_progress(tx, "parsing", "Analyse des resultats...", 75); + let parsed = parse_llm_output(&raw_results, &user_categories)?; + + // Filter: homepage, cross-phase dedup, url dedup, source limit, history + let mut phase2_articles: Vec<(String, NewsItem)> = Vec::new(); // (cat_key, item) + + for (cat_key, items) in parsed { + for item in items { + let url_lower = item.url.to_lowercase(); + + // Homepage filter + if let Ok(parsed_url) = url::Url::parse(&item.url) { + let path = parsed_url.path(); + if path.is_empty() || path == "/" { + trace_article(&state.pool, user_id, job_id, &item.url, &item.title, "web_search", None, None, None, "filtered_homepage", false).await; + continue; + } + } + + // Cross-phase dedup + if seen_urls.contains(&url_lower) { + trace_article(&state.pool, user_id, job_id, &item.url, &item.title, "web_search", None, None, None, "filtered_cross_phase_dedup", false).await; + continue; + } + + // History dedup + if settings.article_history_days > 0 { + let hash = hash_article_url(&item.url); + let exists = db::article_history::check_urls_exist(&state.pool, user_id, &[hash.clone()]).await.unwrap_or_default(); + if exists.contains(&hash) { + trace_article(&state.pool, user_id, job_id, &item.url, &item.title, "web_search", None, None, None, "filtered_history", false).await; + continue; + } + } + + // Source limit + if let Some(domain) = extract_domain(&item.url) { + let count = source_counts.get(&domain).copied().unwrap_or(0); + if count >= settings.max_articles_per_source as usize { + trace_article(&state.pool, user_id, job_id, &item.url, &item.title, "web_search", None, None, None, "filtered_diversity", false).await; + continue; + } + } + + seen_urls.insert(url_lower); + phase2_articles.push((cat_key.clone(), item)); + } + } + + // Scrape Phase 2 articles for validation + emit_progress(tx, "scraping", "Verification des sources web...", 80); + for (cat_key, item) in phase2_articles { + let (body_text, _, final_url) = scrape_single_article(&state.http_client, &item.url, settings.max_age_days as i64).await; + + if body_text.trim().is_empty() { + trace_article(&state.pool, user_id, job_id, &final_url, &item.title, "web_search", None, None, None, "filtered_empty", false).await; + continue; + } + + // Use the LLM-provided title and summary (Phase 2 summaries are final) + article_scraped.entry(cat_key).or_default().push(NewsItem { + title: item.title, + url: final_url, + summary: item.summary, + }); + + if let Some(domain) = extract_domain(&item.url) { + *source_counts.entry(domain).or_insert(0) += 1; + } + } + } + + // === SAVE === + if article_scraped.values().all(|items| items.is_empty()) { + return Err(AppError::BadRequest("Aucun article valide trouve. Verifiez vos sources et categories.".into())); + } + + emit_progress(tx, "saving", "Sauvegarde de la synthese...", 90); + + // Build final sections + let mut final_sections: Vec = Vec::new(); + for (i, cat_name) in user_categories.iter().enumerate() { + let key = format!("category_{}", i); + if let Some(items) = article_scraped.get(&key) { + if !items.is_empty() { + final_sections.push(NewsSection { title: cat_name.clone(), items: items.clone() }); + } + } + } + if let Some(autre_items) = article_scraped.get("category_autre") { + if !autre_items.is_empty() { + final_sections.push(NewsSection { title: "Autre".to_string(), items: autre_items.clone() }); + } + } + + let sections_json = serde_json::to_value(&final_sections).map_err(|e| AppError::Internal(anyhow::anyhow!("Failed to serialize: {}", e)))?; + let sections_json = sanitize_json_null_bytes(sections_json); + + let synthesis = db::syntheses::create(&state.pool, user_id, &get_iso_week_string(Utc::now().date_naive()), §ions_json, job_id).await?; + + // Record used articles + if settings.article_history_days > 0 { + for section in &final_sections { + for item in §ion.items { + let source_url = url_source.get(&item.url).map(|s| s.as_str()); + trace_article(&state.pool, user_id, job_id, &item.url, &item.title, + if source_url.is_some() { "personalized_source" } else { "web_search" }, + source_url, Some(§ion.title), Some(synthesis.id), "used", true).await; + } + } + } + + Ok(synthesis.id) +} +``` + +- [ ] **Step 4: Add `rotate_sources` unit tests** + +```rust + #[test] + fn rotate_sources_after_last_used() { + // Create mock sources — need Source struct with url field + // Test that rotation works correctly + } +``` + +- [ ] **Step 5: Verify + commit** + +```bash +cd backend && cargo test --lib +git add backend/src/services/synthesis.rs +git commit -m "feat: rewrite synthesis pipeline — per-article classify/summarize, no rewrite pass" +``` + +--- + +### Task 6: Frontend — remove deprecated settings + +**Files:** +- Modify: `frontend/src/types.ts` +- Modify: `frontend/src/pages/Settings.tsx` +- Modify: `frontend/src/i18n/fr.ts` + +- [ ] **Step 1: Remove fields from types** + +Remove `source_diversity_window: number` and `use_llm_for_article_extraction: boolean` from `UserSettings` and `DEFAULT_SETTINGS`. + +- [ ] **Step 2: Remove from Settings page** + +Remove the diversity window number input and the LLM extraction checkbox from `Settings.tsx`. + +- [ ] **Step 3: Remove i18n labels** + +Remove `settings.diversityWindow` and `settings.useLlmForArticleExtraction` labels. + +- [ ] **Step 4: Verify + commit** + +```bash +cd frontend && npx tsc --noEmit && npx vitest run +git add frontend/src/types.ts frontend/src/pages/Settings.tsx frontend/src/i18n/fr.ts +git commit -m "feat: remove deprecated settings from frontend" +``` + +--- + +### Task 7: Update E2E test + +**Files:** +- Modify: `e2e/tests/generation-live.spec.ts` + +- [ ] **Step 1: Update settings payload** + +Remove `source_diversity_window` and `use_llm_for_article_extraction` from the PUT settings body. + +- [ ] **Step 2: Commit** + +```bash +git add e2e/tests/generation-live.spec.ts +git commit -m "test: update E2E test for new pipeline (remove deprecated settings)" +```