docs: add algorithm rewrite implementation plan (7 tasks)

master
oabrivard 3 months ago
parent 1d5dc0596c
commit d3b63295f6

@ -0,0 +1,688 @@
# Algorithm Rewrite — Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Rewrite the synthesis generation pipeline: per-article LLM classify/summarize, source rotation, no rewrite pass, remove deprecated settings.
**Architecture:** Complete rewrite of `synthesis.rs` with a simpler two-phase pipeline. Phase 1: scrape personalized sources sequentially, classify/summarize each article with one LLM call. Phase 2: LLM search for gaps, scrape for validation. No batch classification, no rewrite pass.
**Tech Stack:** Rust (sqlx, reqwest, scraper), existing LLM providers
**Spec:** `docs/superpowers/specs/2026-03-25-algorithm-rewrite-design.md`
**Algorithm:** `docs/algorithm.md`
---
### Task 1: Migration — drop deprecated settings columns
**Files:**
- Create: `backend/migrations/20260325000018_drop_deprecated_settings.sql`
- Modify: `backend/src/models/settings.rs`
- Modify: `backend/src/db/settings.rs`
- Modify: `backend/src/services/prompts.rs` (test fixture)
- Modify: `CLAUDE.md`
- [ ] **Step 1: Create migration**
```sql
ALTER TABLE settings DROP COLUMN source_diversity_window;
ALTER TABLE settings DROP COLUMN use_llm_for_article_extraction;
```
- [ ] **Step 2: Remove from settings model**
In `models/settings.rs`, remove `source_diversity_window: i32` and `use_llm_for_article_extraction: bool` from `UserSettings`, `SettingsResponse`, `UpdateSettingsRequest`, `From` impl, `Default` impl, and validation.
- [ ] **Step 3: Remove from DB queries**
In `db/settings.rs`, remove both fields from `SettingsRow`, `TryFrom`, and both SQL queries (column lists, VALUES, RETURNING, ON CONFLICT SET, .bind() calls). Decrement $N placeholders carefully.
- [ ] **Step 4: Update test fixtures**
Remove both fields from `valid_request()` in settings tests and `test_settings()` in prompts tests. Remove any validation tests for these fields.
- [ ] **Step 5: Update CLAUDE.md migration count to 18**
- [ ] **Step 6: Verify + commit**
```bash
cd backend && cargo test --lib
git add backend/migrations/20260325000018_drop_deprecated_settings.sql backend/src/models/settings.rs backend/src/db/settings.rs backend/src/services/prompts.rs CLAUDE.md
git commit -m "feat: drop source_diversity_window and use_llm_for_article_extraction settings"
```
---
### Task 2: New prompt + schema for per-article classify/summarize
**Files:**
- Modify: `backend/src/services/prompts.rs`
- Modify: `backend/src/services/llm/schema.rs`
- [ ] **Step 1: Add `build_article_classify_prompt` to prompts.rs**
```rust
/// Build a prompt for per-article classification and summarization.
///
/// The LLM classifies the article into a category and generates a title + summary.
pub fn build_article_classify_prompt(
title: &str,
body_snippet: &str,
categories: &[String], // includes "Autre"
) -> (String, String) {
let system_prompt =
"Tu es un assistant qui analyse des articles d'actualite. \
Tu dois classer l'article dans une categorie et generer un titre et un resume. \
Reponds uniquement au format JSON demande."
.to_string();
let categories_list = categories
.iter()
.map(|c| format!("- \"{}\"", c))
.collect::<Vec<_>>()
.join("\n");
let user_prompt = format!(
"Voici un article d'actualite.\n\n\
Titre : {title}\n\n\
Contenu (extrait) :\n{body}\n\n\
Categories disponibles :\n{categories}\n\n\
Classe cet article dans la categorie la plus appropriee.\n\
Si aucune categorie ne correspond, utilise \"Autre\".\n\
Genere un titre clair et un resume de 4 a 5 lignes.\n\
Si le titre fourni est vide, genere un titre a partir du contenu.",
title = if title.is_empty() { "(pas de titre)" } else { title },
body = body_snippet,
categories = categories_list,
);
(system_prompt, user_prompt)
}
```
- [ ] **Step 2: Add `build_article_classify_schema` to schema.rs**
```rust
/// Build a JSON Schema for per-article classification and summarization.
pub fn build_article_classify_schema() -> Value {
serde_json::json!({
"type": "object",
"properties": {
"title": { "type": "string", "description": "Article title" },
"summary": { "type": "string", "description": "4-5 line summary of the article" },
"category": { "type": "string", "description": "Category name from the provided list" }
},
"required": ["title", "summary", "category"],
"additionalProperties": false
})
}
```
- [ ] **Step 3: Add tests**
In prompts.rs tests:
```rust
#[test]
fn article_classify_prompt_includes_content() {
let (sys, user) = build_article_classify_prompt("GPT-5 Released", "OpenAI released GPT-5", &["AI News".into(), "Autre".into()]);
assert!(user.contains("GPT-5 Released"));
assert!(user.contains("AI News"));
assert!(user.contains("Autre"));
assert!(sys.contains("classer"));
}
#[test]
fn article_classify_prompt_handles_empty_title() {
let (_, user) = build_article_classify_prompt("", "Some content", &["Tech".into(), "Autre".into()]);
assert!(user.contains("(pas de titre)"));
}
```
In schema.rs tests:
```rust
#[test]
fn article_classify_schema_has_all_fields() {
let schema = build_article_classify_schema();
let props = schema["properties"].as_object().unwrap();
assert!(props.contains_key("title"));
assert!(props.contains_key("summary"));
assert!(props.contains_key("category"));
assert_eq!(schema["additionalProperties"], false);
}
```
- [ ] **Step 4: Verify + commit**
```bash
cd backend && cargo test --lib
git add backend/src/services/prompts.rs backend/src/services/llm/schema.rs
git commit -m "feat: add per-article classify/summarize prompt and schema"
```
---
### Task 3: Add `get_last_source_url` to article_history DB + simplify ScrapedContent
**Files:**
- Modify: `backend/src/db/article_history.rs`
- Modify: `backend/src/services/scraper.rs`
- [ ] **Step 1: Add `get_last_source_url`**
```rust
/// Get the source_url from the most recent 'used' entry for source rotation.
pub async fn get_last_source_url(
pool: &PgPool,
user_id: Uuid,
) -> Result<Option<String>, AppError> {
let result = sqlx::query_scalar::<_, String>(
"SELECT source_url FROM article_history WHERE user_id = $1 AND status = 'used' AND source_url IS NOT NULL ORDER BY created_at DESC LIMIT 1",
)
.bind(user_id)
.fetch_optional(pool)
.await?;
Ok(result)
}
```
- [ ] **Step 2: Remove `head_html` from `ScrapedContent`**
In `scraper.rs`, remove `pub head_html: String` from the `ScrapedContent` struct. Remove the `head_html` extraction code in `scrape_url` (the block that finds `<head>...</head>`). Remove `head_html` from the return struct construction.
This will cause compilation errors in `source_scraper.rs` where `extract_article_links_with_llm` uses `content.head_html` — but source_scraper uses its own `extract_head_and_body` function, not `ScrapedContent.head_html`. Check and fix any references.
Also check `scrape_single_article_with_llm` in `synthesis.rs` — it references `content.head_html`. This function will be removed in Task 5, but it needs to compile now. Temporarily replace `content.head_html` with `String::new()` if needed, or remove the function now.
- [ ] **Step 3: Verify + commit**
```bash
cd backend && cargo test --lib
git add backend/src/db/article_history.rs backend/src/services/scraper.rs backend/src/services/synthesis.rs
git commit -m "feat: add get_last_source_url + remove head_html from ScrapedContent"
```
---
### Task 4: Remove old prompts, schemas, and unused code
**Files:**
- Modify: `backend/src/services/prompts.rs`
- Modify: `backend/src/services/llm/schema.rs`
- [ ] **Step 1: Remove old prompts from prompts.rs**
Remove these functions and their tests:
- `build_rewrite_prompt`
- `build_classification_prompt`
- `build_article_extraction_prompt`
- `build_link_extraction_prompt` — WAIT, this one stays (used by source_scraper LLM link extraction)
So remove: `build_rewrite_prompt`, `build_classification_prompt`, `build_article_extraction_prompt` and their tests.
Also remove the `build_search_prompt` parameter `category_gaps: Option<&[(String, i32)]>` — simplify back to always using `max_items_per_category`. Actually wait — Phase 2 still uses gap-aware search. Keep `category_gaps` parameter.
Remove `use crate::models::synthesis::ScrapedNewsItem;` if it's no longer needed (check if `build_classification_prompt` was the only user).
- [ ] **Step 2: Remove old schemas from schema.rs**
Remove: `build_classification_schema`, `build_article_extraction_schema`
Keep: `build_category_schema` (Phase 2 search), `build_link_extraction_schema` (source scraper), `build_article_classify_schema` (new)
- [ ] **Step 3: Verify + commit**
```bash
cd backend && cargo test --lib
git add backend/src/services/prompts.rs backend/src/services/llm/schema.rs
git commit -m "refactor: remove old classification, rewrite, and article extraction prompts/schemas"
```
---
### Task 5: Rewrite `synthesis.rs` — the core pipeline
**Files:**
- Modify: `backend/src/services/synthesis.rs`
This is the largest task. The entire `run_generation_inner` function is rewritten. Many helper functions are removed.
- [ ] **Step 1: Remove dead helper functions**
Delete these functions and their tests from `synthesis.rs`:
- `scrape_single_article_with_llm`
- `scrape_flat_urls`
- `scrape_articles`
- `filter_empty_scraped_articles`
- `build_rewrite_schema`
- `build_final_sections`
- `restore_scraped_urls`
- `parse_classification_response`
- `limit_articles_per_source`
- `dedup_by_url`
- `filter_homepage_urls`
- `SYNTHESIS_MIN_FILL_RATIO` constant
- All associated tests for these functions
Keep:
- `scrape_single_article` (used for Phase 1 per-article scraping)
- `emit_progress`
- `trace_article`
- `log_llm_call`
- `normalize_article_url` / `hash_article_url`
- `extract_domain`
- `resolve_provider_and_key` / `resolve_model`
- `check_rate_limit` / `get_user_rate_limiter`
- `sanitize_json_null_bytes`
- `sanitize_error_message`
- `get_iso_week_string`
- `parse_llm_output` (used in Phase 2)
- [ ] **Step 2: Add `rotate_sources` helper**
```rust
/// Rotate the sources list so that the source after the last-used source comes first.
fn rotate_sources(sources: Vec<Source>, last_source_url: Option<&str>) -> Vec<Source> {
let Some(last_url) = last_source_url else {
return sources;
};
let pos = sources.iter().position(|s| s.url == last_url);
match pos {
Some(idx) => {
let next = (idx + 1) % sources.len();
let mut rotated = sources[next..].to_vec();
rotated.extend_from_slice(&sources[..next]);
rotated
}
None => sources, // Last source not in list, don't rotate
}
}
```
- [ ] **Step 3: Rewrite `run_generation_inner`**
Replace the entire function body with the new algorithm. The new flow:
```rust
async fn run_generation_inner(
job_id: Uuid,
state: &AppState,
user_id: Uuid,
tx: &watch::Sender<ProgressEvent>,
) -> Result<Uuid, AppError> {
// === INITIALIZATION ===
emit_progress(tx, "settings", "Chargement des parametres...", 5);
let settings = db::settings::get_or_create_default(&state.pool, user_id).await?;
// Cleanup
if settings.article_history_days > 0 {
db::article_history::cleanup_old(&state.pool, user_id, settings.article_history_days).await.unwrap_or(0);
db::llm_call_log::truncate_old(&state.pool, user_id, settings.article_history_days).await.ok();
}
// Categories — if empty, default to just "Autre"
let user_categories = if settings.categories.is_empty() {
Vec::new()
} else {
settings.categories.clone()
};
let mut classification_categories = user_categories.clone();
classification_categories.push("Autre".to_string());
// Load sources
emit_progress(tx, "sources", "Chargement des sources...", 10);
let sources = db::sources::list_for_user(&state.pool, user_id).await?;
// Resolve provider
emit_progress(tx, "provider", "Configuration du fournisseur IA...", 12);
let (provider_name, api_key) = resolve_provider_and_key(state, user_id, &settings).await?;
let provider = create_provider(&provider_name, api_key)?;
let model_research = if !settings.ai_model.is_empty() { settings.ai_model.clone() } else { resolve_model(state, &provider_name).await? };
let model_writing = if !settings.ai_model_writing.is_empty() { settings.ai_model_writing.clone() } else { model_research.clone() };
let user_rate_limiter = get_user_rate_limiter(state, &settings, user_id);
// Tracking structures
let mut article_scraped: HashMap<String, Vec<NewsItem>> = HashMap::new();
let mut source_counts: HashMap<String, usize> = HashMap::new();
let mut url_source: HashMap<String, String> = HashMap::new(); // url → source_url
let mut filled_counts: HashMap<String, usize> = HashMap::new();
let mut seen_urls: std::collections::HashSet<String> = std::collections::HashSet::new();
let max_total = (user_categories.len() + 1) * settings.max_items_per_category as usize;
let classify_schema = build_article_classify_schema();
// === PHASE 1: Personalized Sources ===
if !sources.is_empty() {
emit_progress(tx, "sources_scrape", "Analyse des sources personnalisees...", 15);
// 1a. Rotate sources
let last_source = db::article_history::get_last_source_url(&state.pool, user_id).await.unwrap_or(None);
let rotated_sources = rotate_sources(sources.clone(), last_source.as_deref());
let max_sources = rotated_sources.len().min(10);
let max_links = 10usize;
let mut candidate_urls: Vec<(String, String)> = Vec::new(); // (article_url, source_url)
for source in rotated_sources.iter().take(max_sources) {
let links = if settings.use_llm_for_source_links {
source_scraper::extract_article_links_with_llm(
&state.http_client, &source.url, max_links, &provider, &model_research,
).await
} else {
source_scraper::extract_article_links(
&state.http_client, &source.url, max_links,
).await
};
if let Ok(links) = links {
for link in links {
if seen_urls.insert(link.to_lowercase()) {
candidate_urls.push((link, source.url.clone()));
}
}
}
}
// Filter against article history
if settings.article_history_days > 0 && !candidate_urls.is_empty() {
let hashes: Vec<String> = candidate_urls.iter().map(|(url, _)| hash_article_url(url)).collect();
let existing = db::article_history::check_urls_exist(&state.pool, user_id, &hashes).await.unwrap_or_default();
if !existing.is_empty() {
// Trace filtered articles
for (url, source_url) in &candidate_urls {
if existing.contains(&hash_article_url(url)) {
trace_article(&state.pool, user_id, job_id, url, "", "personalized_source", Some(source_url), None, None, "filtered_history", false).await;
}
}
candidate_urls.retain(|(url, _)| !existing.contains(&hash_article_url(url)));
}
}
// Track url → source
for (url, source_url) in &candidate_urls {
url_source.insert(url.clone(), source_url.clone());
}
// 1b. Scrape, classify, summarize each article
emit_progress(tx, "processing", "Traitement des articles...", 25);
let total_candidates = candidate_urls.len();
for (idx, (url, source_url)) in candidate_urls.into_iter().enumerate() {
// Progress
let pct = 25 + ((idx as u32 * 40) / total_candidates.max(1) as u32).min(40);
emit_progress(tx, "processing", &format!("Article {}/{}...", idx + 1, total_candidates), pct as u8);
// Check source limit
let source_domain = extract_domain(&source_url).unwrap_or_default();
let source_count = source_counts.get(&source_domain).copied().unwrap_or(0);
if source_count >= settings.max_articles_per_source as usize {
trace_article(&state.pool, user_id, job_id, &url, "", "personalized_source", Some(&source_url), None, None, "filtered_diversity", false).await;
continue;
}
// Scrape
let (body_text, page_title, final_url) = scrape_single_article(&state.http_client, &url, settings.max_age_days as i64).await;
if body_text.trim().is_empty() {
trace_article(&state.pool, user_id, job_id, &final_url, &page_title, "personalized_source", Some(&source_url), None, None, "filtered_empty", false).await;
continue;
}
// LLM classify + summarize
check_rate_limit(state, &user_rate_limiter, &provider_name)?;
let body_snippet: String = body_text.chars().take(500).collect();
let (class_sys, class_user) = prompts::build_article_classify_prompt(&page_title, &body_snippet, &classification_categories);
let llm_start = std::time::Instant::now();
let class_response = provider.call_llm(&model_research, &class_sys, &class_user, &classify_schema).await?;
let llm_duration = llm_start.elapsed().as_millis() as u64;
log_llm_call(&state.pool, user_id, job_id, "classify_summarize", &model_research, &class_sys, &class_user, &class_response, llm_duration).await;
// Parse response
let llm_title = class_response.get("title").and_then(|t| t.as_str()).unwrap_or(&page_title).to_string();
let llm_summary = class_response.get("summary").and_then(|s| s.as_str()).unwrap_or("").to_string();
let mut llm_category = class_response.get("category").and_then(|c| c.as_str()).unwrap_or("Autre").to_string();
// Validate category — if not in list, use "Autre"
if !classification_categories.iter().any(|c| c.to_lowercase() == llm_category.to_lowercase()) {
llm_category = "Autre".to_string();
}
// Map category to key
let cat_key = if llm_category == "Autre" {
"category_autre".to_string()
} else {
user_categories.iter().position(|c| c.to_lowercase() == llm_category.to_lowercase())
.map(|i| format!("category_{}", i))
.unwrap_or_else(|| "category_autre".to_string())
};
// Check if category is full → overflow to "Autre"
let cat_filled = filled_counts.get(&llm_category).copied().unwrap_or(0);
let (final_cat_key, final_cat_name) = if cat_filled >= settings.max_items_per_category as usize && llm_category != "Autre" {
let autre_filled = filled_counts.get("Autre").copied().unwrap_or(0);
if autre_filled >= settings.max_items_per_category as usize {
// Both full — skip article
continue;
}
("category_autre".to_string(), "Autre".to_string())
} else {
(cat_key, llm_category)
};
// Add article
article_scraped.entry(final_cat_key).or_default().push(NewsItem {
title: llm_title,
url: final_url.clone(),
summary: llm_summary,
});
*filled_counts.entry(final_cat_name).or_insert(0) += 1;
*source_counts.entry(source_domain).or_insert(0) += 1;
// Check if we've reached the maximum
let total: usize = article_scraped.values().map(|v| v.len()).sum();
if total >= max_total {
break;
}
}
}
// === PHASE 2: Web Search Fallback ===
let category_gaps: Vec<(String, i32)> = user_categories.iter().filter_map(|cat| {
let filled = filled_counts.get(cat).copied().unwrap_or(0);
let needed = (settings.max_items_per_category as usize).saturating_sub(filled);
if needed > 0 { Some((cat.clone(), needed as i32)) } else { None }
}).collect();
if !category_gaps.is_empty() {
emit_progress(tx, "search", "Recherche d'actualites complementaires...", 70);
check_rate_limit(state, &user_rate_limiter, &provider_name)?;
let search_schema = build_category_schema(&user_categories, settings.max_items_per_category);
let current_date = Utc::now().format("%A %d %B %Y").to_string();
let (sys_prompt, usr_prompt) = prompts::build_search_prompt(&settings, &sources, &current_date, &[], Some(&category_gaps));
let llm_start = std::time::Instant::now();
let raw_results = provider.call_llm(&model_research, &sys_prompt, &usr_prompt, &search_schema).await?;
let llm_duration = llm_start.elapsed().as_millis() as u64;
log_llm_call(&state.pool, user_id, job_id, "search", &model_research, &sys_prompt, &usr_prompt, &raw_results, llm_duration).await;
// Parse and filter
emit_progress(tx, "parsing", "Analyse des resultats...", 75);
let parsed = parse_llm_output(&raw_results, &user_categories)?;
// Filter: homepage, cross-phase dedup, url dedup, source limit, history
let mut phase2_articles: Vec<(String, NewsItem)> = Vec::new(); // (cat_key, item)
for (cat_key, items) in parsed {
for item in items {
let url_lower = item.url.to_lowercase();
// Homepage filter
if let Ok(parsed_url) = url::Url::parse(&item.url) {
let path = parsed_url.path();
if path.is_empty() || path == "/" {
trace_article(&state.pool, user_id, job_id, &item.url, &item.title, "web_search", None, None, None, "filtered_homepage", false).await;
continue;
}
}
// Cross-phase dedup
if seen_urls.contains(&url_lower) {
trace_article(&state.pool, user_id, job_id, &item.url, &item.title, "web_search", None, None, None, "filtered_cross_phase_dedup", false).await;
continue;
}
// History dedup
if settings.article_history_days > 0 {
let hash = hash_article_url(&item.url);
let exists = db::article_history::check_urls_exist(&state.pool, user_id, &[hash.clone()]).await.unwrap_or_default();
if exists.contains(&hash) {
trace_article(&state.pool, user_id, job_id, &item.url, &item.title, "web_search", None, None, None, "filtered_history", false).await;
continue;
}
}
// Source limit
if let Some(domain) = extract_domain(&item.url) {
let count = source_counts.get(&domain).copied().unwrap_or(0);
if count >= settings.max_articles_per_source as usize {
trace_article(&state.pool, user_id, job_id, &item.url, &item.title, "web_search", None, None, None, "filtered_diversity", false).await;
continue;
}
}
seen_urls.insert(url_lower);
phase2_articles.push((cat_key.clone(), item));
}
}
// Scrape Phase 2 articles for validation
emit_progress(tx, "scraping", "Verification des sources web...", 80);
for (cat_key, item) in phase2_articles {
let (body_text, _, final_url) = scrape_single_article(&state.http_client, &item.url, settings.max_age_days as i64).await;
if body_text.trim().is_empty() {
trace_article(&state.pool, user_id, job_id, &final_url, &item.title, "web_search", None, None, None, "filtered_empty", false).await;
continue;
}
// Use the LLM-provided title and summary (Phase 2 summaries are final)
article_scraped.entry(cat_key).or_default().push(NewsItem {
title: item.title,
url: final_url,
summary: item.summary,
});
if let Some(domain) = extract_domain(&item.url) {
*source_counts.entry(domain).or_insert(0) += 1;
}
}
}
// === SAVE ===
if article_scraped.values().all(|items| items.is_empty()) {
return Err(AppError::BadRequest("Aucun article valide trouve. Verifiez vos sources et categories.".into()));
}
emit_progress(tx, "saving", "Sauvegarde de la synthese...", 90);
// Build final sections
let mut final_sections: Vec<NewsSection> = Vec::new();
for (i, cat_name) in user_categories.iter().enumerate() {
let key = format!("category_{}", i);
if let Some(items) = article_scraped.get(&key) {
if !items.is_empty() {
final_sections.push(NewsSection { title: cat_name.clone(), items: items.clone() });
}
}
}
if let Some(autre_items) = article_scraped.get("category_autre") {
if !autre_items.is_empty() {
final_sections.push(NewsSection { title: "Autre".to_string(), items: autre_items.clone() });
}
}
let sections_json = serde_json::to_value(&final_sections).map_err(|e| AppError::Internal(anyhow::anyhow!("Failed to serialize: {}", e)))?;
let sections_json = sanitize_json_null_bytes(sections_json);
let synthesis = db::syntheses::create(&state.pool, user_id, &get_iso_week_string(Utc::now().date_naive()), &sections_json, job_id).await?;
// Record used articles
if settings.article_history_days > 0 {
for section in &final_sections {
for item in &section.items {
let source_url = url_source.get(&item.url).map(|s| s.as_str());
trace_article(&state.pool, user_id, job_id, &item.url, &item.title,
if source_url.is_some() { "personalized_source" } else { "web_search" },
source_url, Some(&section.title), Some(synthesis.id), "used", true).await;
}
}
}
Ok(synthesis.id)
}
```
- [ ] **Step 4: Add `rotate_sources` unit tests**
```rust
#[test]
fn rotate_sources_after_last_used() {
// Create mock sources — need Source struct with url field
// Test that rotation works correctly
}
```
- [ ] **Step 5: Verify + commit**
```bash
cd backend && cargo test --lib
git add backend/src/services/synthesis.rs
git commit -m "feat: rewrite synthesis pipeline — per-article classify/summarize, no rewrite pass"
```
---
### Task 6: Frontend — remove deprecated settings
**Files:**
- Modify: `frontend/src/types.ts`
- Modify: `frontend/src/pages/Settings.tsx`
- Modify: `frontend/src/i18n/fr.ts`
- [ ] **Step 1: Remove fields from types**
Remove `source_diversity_window: number` and `use_llm_for_article_extraction: boolean` from `UserSettings` and `DEFAULT_SETTINGS`.
- [ ] **Step 2: Remove from Settings page**
Remove the diversity window number input and the LLM extraction checkbox from `Settings.tsx`.
- [ ] **Step 3: Remove i18n labels**
Remove `settings.diversityWindow` and `settings.useLlmForArticleExtraction` labels.
- [ ] **Step 4: Verify + commit**
```bash
cd frontend && npx tsc --noEmit && npx vitest run
git add frontend/src/types.ts frontend/src/pages/Settings.tsx frontend/src/i18n/fr.ts
git commit -m "feat: remove deprecated settings from frontend"
```
---
### Task 7: Update E2E test
**Files:**
- Modify: `e2e/tests/generation-live.spec.ts`
- [ ] **Step 1: Update settings payload**
Remove `source_diversity_window` and `use_llm_for_article_extraction` from the PUT settings body.
- [ ] **Step 2: Commit**
```bash
git add e2e/tests/generation-live.spec.ts
git commit -m "test: update E2E test for new pipeline (remove deprecated settings)"
```
Loading…
Cancel
Save