You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
ai_synth/docs/superpowers/plans/2026-03-25-algorithm-rewrit...

28 KiB

Algorithm Rewrite — Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Rewrite the synthesis generation pipeline: per-article LLM classify/summarize, source rotation, no rewrite pass, remove deprecated settings.

Architecture: Complete rewrite of synthesis.rs with a simpler two-phase pipeline. Phase 1: scrape personalized sources sequentially, classify/summarize each article with one LLM call. Phase 2: LLM search for gaps, scrape for validation. No batch classification, no rewrite pass.

Tech Stack: Rust (sqlx, reqwest, scraper), existing LLM providers

Spec: docs/superpowers/specs/2026-03-25-algorithm-rewrite-design.md Algorithm: docs/algorithm.md


Task 1: Migration — drop deprecated settings columns

Files:

  • Create: backend/migrations/20260325000018_drop_deprecated_settings.sql

  • Modify: backend/src/models/settings.rs

  • Modify: backend/src/db/settings.rs

  • Modify: backend/src/services/prompts.rs (test fixture)

  • Modify: CLAUDE.md

  • Step 1: Create migration

ALTER TABLE settings DROP COLUMN source_diversity_window;
ALTER TABLE settings DROP COLUMN use_llm_for_article_extraction;
  • Step 2: Remove from settings model

In models/settings.rs, remove source_diversity_window: i32 and use_llm_for_article_extraction: bool from UserSettings, SettingsResponse, UpdateSettingsRequest, From impl, Default impl, and validation.

  • Step 3: Remove from DB queries

In db/settings.rs, remove both fields from SettingsRow, TryFrom, and both SQL queries (column lists, VALUES, RETURNING, ON CONFLICT SET, .bind() calls). Decrement $N placeholders carefully.

  • Step 4: Update test fixtures

Remove both fields from valid_request() in settings tests and test_settings() in prompts tests. Remove any validation tests for these fields.

  • Step 5: Update CLAUDE.md migration count to 18

  • Step 6: Verify + commit

cd backend && cargo test --lib
git add backend/migrations/20260325000018_drop_deprecated_settings.sql backend/src/models/settings.rs backend/src/db/settings.rs backend/src/services/prompts.rs CLAUDE.md
git commit -m "feat: drop source_diversity_window and use_llm_for_article_extraction settings"

Task 2: New prompt + schema for per-article classify/summarize

Files:

  • Modify: backend/src/services/prompts.rs

  • Modify: backend/src/services/llm/schema.rs

  • Step 1: Add build_article_classify_prompt to prompts.rs

/// Build a prompt for per-article classification and summarization.
///
/// The LLM classifies the article into a category and generates a title + summary.
pub fn build_article_classify_prompt(
    title: &str,
    body_snippet: &str,
    categories: &[String], // includes "Autre"
) -> (String, String) {
    let system_prompt =
        "Tu es un assistant qui analyse des articles d'actualite. \
         Tu dois classer l'article dans une categorie et generer un titre et un resume. \
         Reponds uniquement au format JSON demande."
            .to_string();

    let categories_list = categories
        .iter()
        .map(|c| format!("- \"{}\"", c))
        .collect::<Vec<_>>()
        .join("\n");

    let user_prompt = format!(
        "Voici un article d'actualite.\n\n\
         Titre : {title}\n\n\
         Contenu (extrait) :\n{body}\n\n\
         Categories disponibles :\n{categories}\n\n\
         Classe cet article dans la categorie la plus appropriee.\n\
         Si aucune categorie ne correspond, utilise \"Autre\".\n\
         Genere un titre clair et un resume de 4 a 5 lignes.\n\
         Si le titre fourni est vide, genere un titre a partir du contenu.",
        title = if title.is_empty() { "(pas de titre)" } else { title },
        body = body_snippet,
        categories = categories_list,
    );

    (system_prompt, user_prompt)
}
  • Step 2: Add build_article_classify_schema to schema.rs
/// Build a JSON Schema for per-article classification and summarization.
pub fn build_article_classify_schema() -> Value {
    serde_json::json!({
        "type": "object",
        "properties": {
            "title": { "type": "string", "description": "Article title" },
            "summary": { "type": "string", "description": "4-5 line summary of the article" },
            "category": { "type": "string", "description": "Category name from the provided list" }
        },
        "required": ["title", "summary", "category"],
        "additionalProperties": false
    })
}
  • Step 3: Add tests

In prompts.rs tests:

    #[test]
    fn article_classify_prompt_includes_content() {
        let (sys, user) = build_article_classify_prompt("GPT-5 Released", "OpenAI released GPT-5", &["AI News".into(), "Autre".into()]);
        assert!(user.contains("GPT-5 Released"));
        assert!(user.contains("AI News"));
        assert!(user.contains("Autre"));
        assert!(sys.contains("classer"));
    }

    #[test]
    fn article_classify_prompt_handles_empty_title() {
        let (_, user) = build_article_classify_prompt("", "Some content", &["Tech".into(), "Autre".into()]);
        assert!(user.contains("(pas de titre)"));
    }

In schema.rs tests:

    #[test]
    fn article_classify_schema_has_all_fields() {
        let schema = build_article_classify_schema();
        let props = schema["properties"].as_object().unwrap();
        assert!(props.contains_key("title"));
        assert!(props.contains_key("summary"));
        assert!(props.contains_key("category"));
        assert_eq!(schema["additionalProperties"], false);
    }
  • Step 4: Verify + commit
cd backend && cargo test --lib
git add backend/src/services/prompts.rs backend/src/services/llm/schema.rs
git commit -m "feat: add per-article classify/summarize prompt and schema"

Task 3: Add get_last_source_url to article_history DB + simplify ScrapedContent

Files:

  • Modify: backend/src/db/article_history.rs

  • Modify: backend/src/services/scraper.rs

  • Step 1: Add get_last_source_url

/// Get the source_url from the most recent 'used' entry for source rotation.
pub async fn get_last_source_url(
    pool: &PgPool,
    user_id: Uuid,
) -> Result<Option<String>, AppError> {
    let result = sqlx::query_scalar::<_, String>(
        "SELECT source_url FROM article_history WHERE user_id = $1 AND status = 'used' AND source_url IS NOT NULL ORDER BY created_at DESC LIMIT 1",
    )
    .bind(user_id)
    .fetch_optional(pool)
    .await?;
    Ok(result)
}
  • Step 2: Remove head_html from ScrapedContent

In scraper.rs, remove pub head_html: String from the ScrapedContent struct. Remove the head_html extraction code in scrape_url (the block that finds <head>...</head>). Remove head_html from the return struct construction.

This will cause compilation errors in source_scraper.rs where extract_article_links_with_llm uses content.head_html — but source_scraper uses its own extract_head_and_body function, not ScrapedContent.head_html. Check and fix any references.

Also check scrape_single_article_with_llm in synthesis.rs — it references content.head_html. This function will be removed in Task 5, but it needs to compile now. Temporarily replace content.head_html with String::new() if needed, or remove the function now.

  • Step 3: Verify + commit
cd backend && cargo test --lib
git add backend/src/db/article_history.rs backend/src/services/scraper.rs backend/src/services/synthesis.rs
git commit -m "feat: add get_last_source_url + remove head_html from ScrapedContent"

Task 4: Remove old prompts, schemas, and unused code

Files:

  • Modify: backend/src/services/prompts.rs

  • Modify: backend/src/services/llm/schema.rs

  • Step 1: Remove old prompts from prompts.rs

Remove these functions and their tests:

  • build_rewrite_prompt
  • build_classification_prompt
  • build_article_extraction_prompt
  • build_link_extraction_prompt — WAIT, this one stays (used by source_scraper LLM link extraction)

So remove: build_rewrite_prompt, build_classification_prompt, build_article_extraction_prompt and their tests.

Also remove the build_search_prompt parameter category_gaps: Option<&[(String, i32)]> — simplify back to always using max_items_per_category. Actually wait — Phase 2 still uses gap-aware search. Keep category_gaps parameter.

Remove use crate::models::synthesis::ScrapedNewsItem; if it's no longer needed (check if build_classification_prompt was the only user).

  • Step 2: Remove old schemas from schema.rs

Remove: build_classification_schema, build_article_extraction_schema Keep: build_category_schema (Phase 2 search), build_link_extraction_schema (source scraper), build_article_classify_schema (new)

  • Step 3: Verify + commit
cd backend && cargo test --lib
git add backend/src/services/prompts.rs backend/src/services/llm/schema.rs
git commit -m "refactor: remove old classification, rewrite, and article extraction prompts/schemas"

Task 5: Rewrite synthesis.rs — the core pipeline

Files:

  • Modify: backend/src/services/synthesis.rs

This is the largest task. The entire run_generation_inner function is rewritten. Many helper functions are removed.

  • Step 1: Remove dead helper functions

Delete these functions and their tests from synthesis.rs:

  • scrape_single_article_with_llm
  • scrape_flat_urls
  • scrape_articles
  • filter_empty_scraped_articles
  • build_rewrite_schema
  • build_final_sections
  • restore_scraped_urls
  • parse_classification_response
  • limit_articles_per_source
  • dedup_by_url
  • filter_homepage_urls
  • SYNTHESIS_MIN_FILL_RATIO constant
  • All associated tests for these functions

Keep:

  • scrape_single_article (used for Phase 1 per-article scraping)

  • emit_progress

  • trace_article

  • log_llm_call

  • normalize_article_url / hash_article_url

  • extract_domain

  • resolve_provider_and_key / resolve_model

  • check_rate_limit / get_user_rate_limiter

  • sanitize_json_null_bytes

  • sanitize_error_message

  • get_iso_week_string

  • parse_llm_output (used in Phase 2)

  • Step 2: Add rotate_sources helper

/// Rotate the sources list so that the source after the last-used source comes first.
fn rotate_sources(sources: Vec<Source>, last_source_url: Option<&str>) -> Vec<Source> {
    let Some(last_url) = last_source_url else {
        return sources;
    };

    let pos = sources.iter().position(|s| s.url == last_url);
    match pos {
        Some(idx) => {
            let next = (idx + 1) % sources.len();
            let mut rotated = sources[next..].to_vec();
            rotated.extend_from_slice(&sources[..next]);
            rotated
        }
        None => sources, // Last source not in list, don't rotate
    }
}
  • Step 3: Rewrite run_generation_inner

Replace the entire function body with the new algorithm. The new flow:

async fn run_generation_inner(
    job_id: Uuid,
    state: &AppState,
    user_id: Uuid,
    tx: &watch::Sender<ProgressEvent>,
) -> Result<Uuid, AppError> {
    // === INITIALIZATION ===
    emit_progress(tx, "settings", "Chargement des parametres...", 5);
    let settings = db::settings::get_or_create_default(&state.pool, user_id).await?;

    // Cleanup
    if settings.article_history_days > 0 {
        db::article_history::cleanup_old(&state.pool, user_id, settings.article_history_days).await.unwrap_or(0);
        db::llm_call_log::truncate_old(&state.pool, user_id, settings.article_history_days).await.ok();
    }

    // Categories — if empty, default to just "Autre"
    let user_categories = if settings.categories.is_empty() {
        Vec::new()
    } else {
        settings.categories.clone()
    };
    let mut classification_categories = user_categories.clone();
    classification_categories.push("Autre".to_string());

    // Load sources
    emit_progress(tx, "sources", "Chargement des sources...", 10);
    let sources = db::sources::list_for_user(&state.pool, user_id).await?;

    // Resolve provider
    emit_progress(tx, "provider", "Configuration du fournisseur IA...", 12);
    let (provider_name, api_key) = resolve_provider_and_key(state, user_id, &settings).await?;
    let provider = create_provider(&provider_name, api_key)?;
    let model_research = if !settings.ai_model.is_empty() { settings.ai_model.clone() } else { resolve_model(state, &provider_name).await? };
    let model_writing = if !settings.ai_model_writing.is_empty() { settings.ai_model_writing.clone() } else { model_research.clone() };
    let user_rate_limiter = get_user_rate_limiter(state, &settings, user_id);

    // Tracking structures
    let mut article_scraped: HashMap<String, Vec<NewsItem>> = HashMap::new();
    let mut source_counts: HashMap<String, usize> = HashMap::new();
    let mut url_source: HashMap<String, String> = HashMap::new(); // url → source_url
    let mut filled_counts: HashMap<String, usize> = HashMap::new();
    let mut seen_urls: std::collections::HashSet<String> = std::collections::HashSet::new();
    let max_total = (user_categories.len() + 1) * settings.max_items_per_category as usize;
    let classify_schema = build_article_classify_schema();

    // === PHASE 1: Personalized Sources ===
    if !sources.is_empty() {
        emit_progress(tx, "sources_scrape", "Analyse des sources personnalisees...", 15);

        // 1a. Rotate sources
        let last_source = db::article_history::get_last_source_url(&state.pool, user_id).await.unwrap_or(None);
        let rotated_sources = rotate_sources(sources.clone(), last_source.as_deref());
        let max_sources = rotated_sources.len().min(10);
        let max_links = 10usize;

        let mut candidate_urls: Vec<(String, String)> = Vec::new(); // (article_url, source_url)

        for source in rotated_sources.iter().take(max_sources) {
            let links = if settings.use_llm_for_source_links {
                source_scraper::extract_article_links_with_llm(
                    &state.http_client, &source.url, max_links, &provider, &model_research,
                ).await
            } else {
                source_scraper::extract_article_links(
                    &state.http_client, &source.url, max_links,
                ).await
            };

            if let Ok(links) = links {
                for link in links {
                    if seen_urls.insert(link.to_lowercase()) {
                        candidate_urls.push((link, source.url.clone()));
                    }
                }
            }
        }

        // Filter against article history
        if settings.article_history_days > 0 && !candidate_urls.is_empty() {
            let hashes: Vec<String> = candidate_urls.iter().map(|(url, _)| hash_article_url(url)).collect();
            let existing = db::article_history::check_urls_exist(&state.pool, user_id, &hashes).await.unwrap_or_default();
            if !existing.is_empty() {
                // Trace filtered articles
                for (url, source_url) in &candidate_urls {
                    if existing.contains(&hash_article_url(url)) {
                        trace_article(&state.pool, user_id, job_id, url, "", "personalized_source", Some(source_url), None, None, "filtered_history", false).await;
                    }
                }
                candidate_urls.retain(|(url, _)| !existing.contains(&hash_article_url(url)));
            }
        }

        // Track url → source
        for (url, source_url) in &candidate_urls {
            url_source.insert(url.clone(), source_url.clone());
        }

        // 1b. Scrape, classify, summarize each article
        emit_progress(tx, "processing", "Traitement des articles...", 25);
        let total_candidates = candidate_urls.len();

        for (idx, (url, source_url)) in candidate_urls.into_iter().enumerate() {
            // Progress
            let pct = 25 + ((idx as u32 * 40) / total_candidates.max(1) as u32).min(40);
            emit_progress(tx, "processing", &format!("Article {}/{}...", idx + 1, total_candidates), pct as u8);

            // Check source limit
            let source_domain = extract_domain(&source_url).unwrap_or_default();
            let source_count = source_counts.get(&source_domain).copied().unwrap_or(0);
            if source_count >= settings.max_articles_per_source as usize {
                trace_article(&state.pool, user_id, job_id, &url, "", "personalized_source", Some(&source_url), None, None, "filtered_diversity", false).await;
                continue;
            }

            // Scrape
            let (body_text, page_title, final_url) = scrape_single_article(&state.http_client, &url, settings.max_age_days as i64).await;

            if body_text.trim().is_empty() {
                trace_article(&state.pool, user_id, job_id, &final_url, &page_title, "personalized_source", Some(&source_url), None, None, "filtered_empty", false).await;
                continue;
            }

            // LLM classify + summarize
            check_rate_limit(state, &user_rate_limiter, &provider_name)?;
            let body_snippet: String = body_text.chars().take(500).collect();
            let (class_sys, class_user) = prompts::build_article_classify_prompt(&page_title, &body_snippet, &classification_categories);

            let llm_start = std::time::Instant::now();
            let class_response = provider.call_llm(&model_research, &class_sys, &class_user, &classify_schema).await?;
            let llm_duration = llm_start.elapsed().as_millis() as u64;
            log_llm_call(&state.pool, user_id, job_id, "classify_summarize", &model_research, &class_sys, &class_user, &class_response, llm_duration).await;

            // Parse response
            let llm_title = class_response.get("title").and_then(|t| t.as_str()).unwrap_or(&page_title).to_string();
            let llm_summary = class_response.get("summary").and_then(|s| s.as_str()).unwrap_or("").to_string();
            let mut llm_category = class_response.get("category").and_then(|c| c.as_str()).unwrap_or("Autre").to_string();

            // Validate category — if not in list, use "Autre"
            if !classification_categories.iter().any(|c| c.to_lowercase() == llm_category.to_lowercase()) {
                llm_category = "Autre".to_string();
            }

            // Map category to key
            let cat_key = if llm_category == "Autre" {
                "category_autre".to_string()
            } else {
                user_categories.iter().position(|c| c.to_lowercase() == llm_category.to_lowercase())
                    .map(|i| format!("category_{}", i))
                    .unwrap_or_else(|| "category_autre".to_string())
            };

            // Check if category is full → overflow to "Autre"
            let cat_filled = filled_counts.get(&llm_category).copied().unwrap_or(0);
            let (final_cat_key, final_cat_name) = if cat_filled >= settings.max_items_per_category as usize && llm_category != "Autre" {
                let autre_filled = filled_counts.get("Autre").copied().unwrap_or(0);
                if autre_filled >= settings.max_items_per_category as usize {
                    // Both full — skip article
                    continue;
                }
                ("category_autre".to_string(), "Autre".to_string())
            } else {
                (cat_key, llm_category)
            };

            // Add article
            article_scraped.entry(final_cat_key).or_default().push(NewsItem {
                title: llm_title,
                url: final_url.clone(),
                summary: llm_summary,
            });
            *filled_counts.entry(final_cat_name).or_insert(0) += 1;
            *source_counts.entry(source_domain).or_insert(0) += 1;

            // Check if we've reached the maximum
            let total: usize = article_scraped.values().map(|v| v.len()).sum();
            if total >= max_total {
                break;
            }
        }
    }

    // === PHASE 2: Web Search Fallback ===
    let category_gaps: Vec<(String, i32)> = user_categories.iter().filter_map(|cat| {
        let filled = filled_counts.get(cat).copied().unwrap_or(0);
        let needed = (settings.max_items_per_category as usize).saturating_sub(filled);
        if needed > 0 { Some((cat.clone(), needed as i32)) } else { None }
    }).collect();

    if !category_gaps.is_empty() {
        emit_progress(tx, "search", "Recherche d'actualites complementaires...", 70);
        check_rate_limit(state, &user_rate_limiter, &provider_name)?;

        let search_schema = build_category_schema(&user_categories, settings.max_items_per_category);
        let current_date = Utc::now().format("%A %d %B %Y").to_string();
        let (sys_prompt, usr_prompt) = prompts::build_search_prompt(&settings, &sources, &current_date, &[], Some(&category_gaps));

        let llm_start = std::time::Instant::now();
        let raw_results = provider.call_llm(&model_research, &sys_prompt, &usr_prompt, &search_schema).await?;
        let llm_duration = llm_start.elapsed().as_millis() as u64;
        log_llm_call(&state.pool, user_id, job_id, "search", &model_research, &sys_prompt, &usr_prompt, &raw_results, llm_duration).await;

        // Parse and filter
        emit_progress(tx, "parsing", "Analyse des resultats...", 75);
        let parsed = parse_llm_output(&raw_results, &user_categories)?;

        // Filter: homepage, cross-phase dedup, url dedup, source limit, history
        let mut phase2_articles: Vec<(String, NewsItem)> = Vec::new(); // (cat_key, item)

        for (cat_key, items) in parsed {
            for item in items {
                let url_lower = item.url.to_lowercase();

                // Homepage filter
                if let Ok(parsed_url) = url::Url::parse(&item.url) {
                    let path = parsed_url.path();
                    if path.is_empty() || path == "/" {
                        trace_article(&state.pool, user_id, job_id, &item.url, &item.title, "web_search", None, None, None, "filtered_homepage", false).await;
                        continue;
                    }
                }

                // Cross-phase dedup
                if seen_urls.contains(&url_lower) {
                    trace_article(&state.pool, user_id, job_id, &item.url, &item.title, "web_search", None, None, None, "filtered_cross_phase_dedup", false).await;
                    continue;
                }

                // History dedup
                if settings.article_history_days > 0 {
                    let hash = hash_article_url(&item.url);
                    let exists = db::article_history::check_urls_exist(&state.pool, user_id, &[hash.clone()]).await.unwrap_or_default();
                    if exists.contains(&hash) {
                        trace_article(&state.pool, user_id, job_id, &item.url, &item.title, "web_search", None, None, None, "filtered_history", false).await;
                        continue;
                    }
                }

                // Source limit
                if let Some(domain) = extract_domain(&item.url) {
                    let count = source_counts.get(&domain).copied().unwrap_or(0);
                    if count >= settings.max_articles_per_source as usize {
                        trace_article(&state.pool, user_id, job_id, &item.url, &item.title, "web_search", None, None, None, "filtered_diversity", false).await;
                        continue;
                    }
                }

                seen_urls.insert(url_lower);
                phase2_articles.push((cat_key.clone(), item));
            }
        }

        // Scrape Phase 2 articles for validation
        emit_progress(tx, "scraping", "Verification des sources web...", 80);
        for (cat_key, item) in phase2_articles {
            let (body_text, _, final_url) = scrape_single_article(&state.http_client, &item.url, settings.max_age_days as i64).await;

            if body_text.trim().is_empty() {
                trace_article(&state.pool, user_id, job_id, &final_url, &item.title, "web_search", None, None, None, "filtered_empty", false).await;
                continue;
            }

            // Use the LLM-provided title and summary (Phase 2 summaries are final)
            article_scraped.entry(cat_key).or_default().push(NewsItem {
                title: item.title,
                url: final_url,
                summary: item.summary,
            });

            if let Some(domain) = extract_domain(&item.url) {
                *source_counts.entry(domain).or_insert(0) += 1;
            }
        }
    }

    // === SAVE ===
    if article_scraped.values().all(|items| items.is_empty()) {
        return Err(AppError::BadRequest("Aucun article valide trouve. Verifiez vos sources et categories.".into()));
    }

    emit_progress(tx, "saving", "Sauvegarde de la synthese...", 90);

    // Build final sections
    let mut final_sections: Vec<NewsSection> = Vec::new();
    for (i, cat_name) in user_categories.iter().enumerate() {
        let key = format!("category_{}", i);
        if let Some(items) = article_scraped.get(&key) {
            if !items.is_empty() {
                final_sections.push(NewsSection { title: cat_name.clone(), items: items.clone() });
            }
        }
    }
    if let Some(autre_items) = article_scraped.get("category_autre") {
        if !autre_items.is_empty() {
            final_sections.push(NewsSection { title: "Autre".to_string(), items: autre_items.clone() });
        }
    }

    let sections_json = serde_json::to_value(&final_sections).map_err(|e| AppError::Internal(anyhow::anyhow!("Failed to serialize: {}", e)))?;
    let sections_json = sanitize_json_null_bytes(sections_json);

    let synthesis = db::syntheses::create(&state.pool, user_id, &get_iso_week_string(Utc::now().date_naive()), &sections_json, job_id).await?;

    // Record used articles
    if settings.article_history_days > 0 {
        for section in &final_sections {
            for item in &section.items {
                let source_url = url_source.get(&item.url).map(|s| s.as_str());
                trace_article(&state.pool, user_id, job_id, &item.url, &item.title,
                    if source_url.is_some() { "personalized_source" } else { "web_search" },
                    source_url, Some(&section.title), Some(synthesis.id), "used", true).await;
            }
        }
    }

    Ok(synthesis.id)
}
  • Step 4: Add rotate_sources unit tests
    #[test]
    fn rotate_sources_after_last_used() {
        // Create mock sources — need Source struct with url field
        // Test that rotation works correctly
    }
  • Step 5: Verify + commit
cd backend && cargo test --lib
git add backend/src/services/synthesis.rs
git commit -m "feat: rewrite synthesis pipeline — per-article classify/summarize, no rewrite pass"

Task 6: Frontend — remove deprecated settings

Files:

  • Modify: frontend/src/types.ts

  • Modify: frontend/src/pages/Settings.tsx

  • Modify: frontend/src/i18n/fr.ts

  • Step 1: Remove fields from types

Remove source_diversity_window: number and use_llm_for_article_extraction: boolean from UserSettings and DEFAULT_SETTINGS.

  • Step 2: Remove from Settings page

Remove the diversity window number input and the LLM extraction checkbox from Settings.tsx.

  • Step 3: Remove i18n labels

Remove settings.diversityWindow and settings.useLlmForArticleExtraction labels.

  • Step 4: Verify + commit
cd frontend && npx tsc --noEmit && npx vitest run
git add frontend/src/types.ts frontend/src/pages/Settings.tsx frontend/src/i18n/fr.ts
git commit -m "feat: remove deprecated settings from frontend"

Task 7: Update E2E test

Files:

  • Modify: e2e/tests/generation-live.spec.ts

  • Step 1: Update settings payload

Remove source_diversity_window and use_llm_for_article_extraction from the PUT settings body.

  • Step 2: Commit
git add e2e/tests/generation-live.spec.ts
git commit -m "test: update E2E test for new pipeline (remove deprecated settings)"