From 420603d76a8e82cce311668068afd4e0c2099f5e Mon Sep 17 00:00:00 2001
From: oabrivard <olivier@abrivard.fr>
Date: Tue, 24 Mar 2026 09:19:24 +0100
Subject: [PATCH] Updated specifications of source diversity functionality

---
 .../2026-03-23-source-diversity-history.md    | 342 +++++++++++++++
 .../2026-03-23-source-diversity-limit.md      | 406 ++++++++++++++++++
 ...6-03-23-source-diversity-history-design.md |  83 ++++
 ...026-03-23-source-diversity-limit-design.md | 104 +++++
 4 files changed, 935 insertions(+)
 create mode 100644 docs/superpowers/plans/2026-03-23-source-diversity-history.md
 create mode 100644 docs/superpowers/plans/2026-03-23-source-diversity-limit.md
 create mode 100644 docs/superpowers/specs/2026-03-23-source-diversity-history-design.md
 create mode 100644 docs/superpowers/specs/2026-03-23-source-diversity-limit-design.md
diff --git a/docs/superpowers/plans/2026-03-23-source-diversity-history.md b/docs/superpowers/plans/2026-03-23-source-diversity-history.md
new file mode 100644
index 0000000..8e86e3f
--- /dev/null
+++ b/docs/superpowers/plans/2026-03-23-source-diversity-history.md
@@ -0,0 +1,342 @@
+# Source Diversity via Recent History — Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Inject recently-used domains into the LLM search prompt to encourage source diversity across syntheses.
+
+**Architecture:** New `source_diversity_window` setting (default 3, 0=disabled). At generation time, load recent syntheses, extract domains from JSONB sections, pass to prompt builder which appends a soft avoidance instruction.
+
+**Tech Stack:** Rust (sqlx, serde_json, url crate), SolidJS, PostgreSQL
+
+**Spec:** `docs/superpowers/specs/2026-03-23-source-diversity-history-design.md`
+
+---
+
+### Task 1: Migration + backend model
+
+**Files:**
+- Create: `backend/migrations/20260323000013_add_source_diversity_window.sql`
+- Modify: `backend/src/models/settings.rs`
+- Modify: `backend/src/db/settings.rs`
+- Modify: `CLAUDE.md`
+
+- [ ] **Step 1: Create migration**
+
+```sql
+ALTER TABLE settings ADD COLUMN source_diversity_window INTEGER NOT NULL DEFAULT 3;
+```
+
+- [ ] **Step 2: Add field to all structs in `models/settings.rs`**
+
+Add `pub source_diversity_window: i32` to `UserSettings`, `SettingsResponse`, `UpdateSettingsRequest` (after `max_articles_per_source`).
+
+Add to `From<UserSettings> for SettingsResponse`:
+```rust
+source_diversity_window: s.source_diversity_window,
+```
+
+Add validation in `UpdateSettingsRequest::validate()`:
+```rust
+if !(0..=10).contains(&self.source_diversity_window) {
+    return Err("source_diversity_window must be between 0 and 10".into());
+}
+```
+
+Add to `impl Default for UserSettings`:
+```rust
+source_diversity_window: 3,
+```
+
+- [ ] **Step 3: Add column to DB queries in `db/settings.rs`**
+
+Add `source_diversity_window: i32` to `SettingsRow`. Add to `TryFrom<SettingsRow>`:
+```rust
+source_diversity_window: row.source_diversity_window,
+```
+
+Add to both SQL queries (`get_or_create_default` and `upsert`): INSERT column list, VALUES placeholder, RETURNING clause, `.bind()` call, and ON CONFLICT SET (upsert only). The new column goes after `max_articles_per_source`.
+
+- [ ] **Step 4: Update CLAUDE.md migration count to 13**
+
+- [ ] **Step 5: Add validation tests in `models/settings.rs`**
+
+Add `source_diversity_window: 3` to the `valid_request()` test helper. Then add tests:
+
+```rust
+    #[test]
+    fn test_source_diversity_window_zero_is_valid() {
+        let mut req = valid_request();
+        req.source_diversity_window = 0;
+        assert!(req.validate().is_ok());
+    }
+
+    #[test]
+    fn test_source_diversity_window_ten_is_valid() {
+        let mut req = valid_request();
+        req.source_diversity_window = 10;
+        assert!(req.validate().is_ok());
+    }
+
+    #[test]
+    fn test_source_diversity_window_below_range() {
+        let mut req = valid_request();
+        req.source_diversity_window = -1;
+        assert!(req.validate().is_err());
+    }
+
+    #[test]
+    fn test_source_diversity_window_above_range() {
+        let mut req = valid_request();
+        req.source_diversity_window = 11;
+        assert!(req.validate().is_err());
+    }
+```
+
+- [ ] **Step 6: Run tests**
+
+Run: `cd backend && cargo test --lib`
+Expected: all tests pass
+
+- [ ] **Step 7: Commit**
+
+```bash
+git add backend/migrations/20260323000013_add_source_diversity_window.sql backend/src/models/settings.rs backend/src/db/settings.rs CLAUDE.md
+git commit -m "feat: add source_diversity_window setting (migration + model + DB)"
+```
+
+---
+
+### Task 2: Prompt modification + tests
+
+**Files:**
+- Modify: `backend/src/services/prompts.rs`
+
+- [ ] **Step 1: Add `recent_domains` parameter to `build_search_prompt`**
+
+Change signature from:
+```rust
+pub fn build_search_prompt(
+    settings: &UserSettings,
+    sources: &[Source],
+    current_date: &str,
+) -> (String, String) {
+```
+
+To:
+```rust
+pub fn build_search_prompt(
+    settings: &UserSettings,
+    sources: &[Source],
+    current_date: &str,
+    recent_domains: &[String],
+) -> (String, String) {
+```
+
+- [ ] **Step 2: Append avoidance instruction when domains are non-empty**
+
+At the end of the `user_prompt` format string (after the JSON instruction line, before the closing `"`), add a conditional block. After the `format!()` call that builds `user_prompt`, append:
+
+```rust
+    let user_prompt = if recent_domains.is_empty() {
+        user_prompt
+    } else {
+        let domains_list = recent_domains.join(", ");
+        format!(
+            "{}\n\nEvite si possible les sources deja utilisees dans les syntheses precedentes : {}.",
+            user_prompt, domains_list
+        )
+    };
+```
+
+- [ ] **Step 3: Update test fixture**
+
+In the `test_settings()` function (~line 137), add:
+```rust
+source_diversity_window: 3,
+```
+
+- [ ] **Step 4: Update existing test calls**
+
+All existing tests that call `build_search_prompt` need the 4th argument. Add `&[]` (empty slice) to each existing call. Search for `build_search_prompt(` in the test module and add `, &[]` before the closing `)`.
+
+- [ ] **Step 5: Add new tests**
+
+```rust
+    #[test]
+    fn search_prompt_includes_recent_domains_avoidance() {
+        let settings = test_settings();
+        let sources = vec![];
+        let date = "lundi 17 mars 2026";
+        let domains = vec!["techcrunch.com".to_string(), "theverge.com".to_string()];
+        let (_, user_prompt) = build_search_prompt(&settings, &sources, date, &domains);
+        assert!(user_prompt.contains("Evite si possible"));
+        assert!(user_prompt.contains("techcrunch.com"));
+        assert!(user_prompt.contains("theverge.com"));
+    }
+
+    #[test]
+    fn search_prompt_no_avoidance_when_domains_empty() {
+        let settings = test_settings();
+        let sources = vec![];
+        let date = "lundi 17 mars 2026";
+        let (_, user_prompt) = build_search_prompt(&settings, &sources, date, &[]);
+        assert!(!user_prompt.contains("Evite si possible"));
+    }
+```
+
+- [ ] **Step 6: Run tests**
+
+Run: `cd backend && cargo test --lib`
+Expected: all tests pass
+
+- [ ] **Step 7: Commit**
+
+```bash
+git add backend/src/services/prompts.rs
+git commit -m "feat: build_search_prompt accepts recent_domains for source diversity"
+```
+
+---
+
+### Task 3: Pipeline integration — extract domains + wire prompt
+
+**Files:**
+- Modify: `backend/src/services/synthesis.rs`
+
+- [ ] **Step 1: Add domain extraction from recent syntheses**
+
+Before the `build_search_prompt` call (~line 303), add a new step that loads recent syntheses and extracts domains. Insert between the rate limit check (step 5) and the search pass (step 6):
+
+```rust
+    // Step 5b: Load recently-used domains for source diversity
+    let recent_domains = if settings.source_diversity_window > 0 {
+        let recent = db::syntheses::list_for_user(
+            &state.pool,
+            user_id,
+            settings.source_diversity_window as i64,
+            0,
+        )
+        .await
+        .unwrap_or_default();
+
+        let mut domains: Vec<String> = recent
+            .iter()
+            .filter_map(|s| {
+                serde_json::from_value::<Vec<crate::models::synthesis::NewsSection>>(
+                    s.sections.clone(),
+                )
+                .ok()
+            })
+            .flat_map(|sections| {
+                sections
+                    .into_iter()
+                    .flat_map(|sec| sec.items.into_iter())
+                    .filter_map(|item| extract_domain(&item.url))
+            })
+            .collect();
+
+        domains.sort();
+        domains.dedup();
+        domains
+    } else {
+        Vec::new()
+    };
+```
+
+- [ ] **Step 2: Update the `build_search_prompt` call**
+
+Change line ~304 from:
+```rust
+    let (system_prompt, user_prompt) =
+        prompts::build_search_prompt(&settings, &sources, &current_date);
+```
+
+To:
+```rust
+    let (system_prompt, user_prompt) =
+        prompts::build_search_prompt(&settings, &sources, &current_date, &recent_domains);
+```
+
+- [ ] **Step 3: Run tests**
+
+Run: `cd backend && cargo test --lib`
+Expected: all tests pass
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add backend/src/services/synthesis.rs
+git commit -m "feat: extract recent domains and pass to search prompt for diversity"
+```
+
+---
+
+### Task 4: Frontend setting
+
+**Files:**
+- Modify: `frontend/src/types.ts`
+- Modify: `frontend/src/i18n/fr.ts`
+- Modify: `frontend/src/pages/Settings.tsx`
+
+- [ ] **Step 1: Add field to frontend types**
+
+In `frontend/src/types.ts`, add to `UserSettings` interface (after `max_articles_per_source`):
+```typescript
+source_diversity_window: number;
+```
+
+Add to `DEFAULT_SETTINGS`:
+```typescript
+source_diversity_window: 3,
+```
+
+- [ ] **Step 2: Add i18n label**
+
+In `frontend/src/i18n/fr.ts`, add after `settings.maxArticlesPerSource`:
+```typescript
+'settings.diversityWindow': 'Syntheses a examiner pour diversite',
+```
+
+- [ ] **Step 3: Add number input to Settings page**
+
+In `frontend/src/pages/Settings.tsx`, inside the generation settings grid (after `maxArticlesPerSource`), add:
+
+```tsx
+            <div>
+              <label
+                for="diversityWindow"
+                class="block text-sm font-medium text-gray-700"
+              >
+                {t('settings.diversityWindow')}
+              </label>
+              <div class="mt-1">
+                <input
+                  type="number"
+                  id="diversityWindow"
+                  min="0"
+                  max="10"
+                  class="shadow-sm focus:ring-indigo-500 focus:border-indigo-500 block w-full sm:text-sm border-gray-300 rounded-md py-2 px-3 border"
+                  value={settings().source_diversity_window}
+                  onInput={(e) =>
+                    setSettings((prev) => ({
+                      ...prev,
+                      source_diversity_window:
+                        parseInt(e.currentTarget.value) || 3,
+                    }))
+                  }
+                />
+              </div>
+            </div>
+```
+
+- [ ] **Step 4: Run frontend tests**
+
+Run: `cd frontend && npx tsc --noEmit && npx vitest run`
+Expected: type check passes, all tests pass
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add frontend/src/types.ts frontend/src/i18n/fr.ts frontend/src/pages/Settings.tsx
+git commit -m "feat: add source_diversity_window setting to frontend"
+```
diff --git a/docs/superpowers/plans/2026-03-23-source-diversity-limit.md b/docs/superpowers/plans/2026-03-23-source-diversity-limit.md
new file mode 100644
index 0000000..eee24f2
--- /dev/null
+++ b/docs/superpowers/plans/2026-03-23-source-diversity-limit.md
@@ -0,0 +1,406 @@
+# Source Diversity Limit — Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Limit the number of articles from the same website across all categories, with source diversity spread across categories.
+
+**Architecture:** New `i32` field in UserSettings + migration, post-parse filter function in the generation pipeline, frontend number input.
+
+**Tech Stack:** Rust (sqlx, url crate), SolidJS, PostgreSQL
+
+**Spec:** `docs/superpowers/specs/2026-03-23-source-diversity-limit-design.md`
+
+---
+
+### Task 1: Database migration
+
+**Files:**
+- Create: `backend/migrations/20260323000012_add_max_articles_per_source.sql`
+
+- [ ] **Step 1: Create migration**
+
+```sql
+ALTER TABLE settings ADD COLUMN max_articles_per_source INTEGER NOT NULL DEFAULT 3;
+```
+
+- [ ] **Step 2: Update CLAUDE.md migration count**
+
+Change `## Database (11 migrations)` to `## Database (12 migrations)`.
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add backend/migrations/20260323000012_add_max_articles_per_source.sql CLAUDE.md
+git commit -m "feat: add max_articles_per_source column to user_settings"
+```
+
+---
+
+### Task 2: Backend model + DB queries
+
+**Files:**
+- Modify: `backend/src/models/settings.rs`
+- Modify: `backend/src/db/settings.rs`
+
+- [ ] **Step 1: Add field to all three structs in `models/settings.rs`**
+
+Add `pub max_articles_per_source: i32` to:
+- `UserSettings` (after `max_items_per_category`)
+- `SettingsResponse` (after `max_items_per_category`)
+- `UpdateSettingsRequest` (after `max_items_per_category`)
+
+Add the field to `impl From<UserSettings> for SettingsResponse`:
+```rust
+max_articles_per_source: s.max_articles_per_source,
+```
+
+Add validation in `UpdateSettingsRequest::validate()`:
+```rust
+if !(1..=10).contains(&self.max_articles_per_source) {
+    return Err("max_articles_per_source must be between 1 and 10".into());
+}
+```
+
+Add to `impl Default for UserSettings`:
+```rust
+max_articles_per_source: 3,
+```
+
+- [ ] **Step 2: Add column to all SQL queries in `db/settings.rs`**
+
+Add `max_articles_per_source` to:
+- `SettingsRow` struct field
+- `get_or_create` INSERT column list, VALUES, RETURNING, and `.bind()`
+- `upsert` INSERT column list, VALUES, RETURNING, ON CONFLICT SET, and `.bind()`
+- `UserSettings::try_from(SettingsRow)` mapping
+
+This follows the exact same pattern as `max_items_per_category` in every query.
+
+- [ ] **Step 3: Run tests**
+
+Run: `cd backend && cargo test --lib`
+Expected: all tests pass (existing settings tests use `Default` which now includes the new field)
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add backend/src/models/settings.rs backend/src/db/settings.rs
+git commit -m "feat: add max_articles_per_source to settings model and DB queries"
+```
+
+---
+
+### Task 3: Filter function with unit tests
+
+**Files:**
+- Modify: `backend/src/services/synthesis.rs`
+
+- [ ] **Step 1: Add the `limit_articles_per_source` function**
+
+Add after `filter_homepage_urls`:
+
+```rust
+/// Limit the number of articles from the same domain across all categories.
+///
+/// Spreads articles across categories first (at most 1 per domain per category),
+/// then fills remaining slots from dropped articles in encounter order.
+fn limit_articles_per_source(
+    parsed: Vec<(String, Vec<NewsItem>)>,
+    max_per_source: i32,
+) -> Vec<(String, Vec<NewsItem>)> {
+    let max = max_per_source as usize;
+
+    // Pass 1: keep at most 1 article per domain per category
+    let mut kept: Vec<(String, Vec<NewsItem>)> = Vec::new();
+    let mut dropped: Vec<(usize, NewsItem)> = Vec::new(); // (category_index, item)
+    let mut domain_counts: std::collections::HashMap<String, usize> =
+        std::collections::HashMap::new();
+
+    for (cat_idx, (cat_key, items)) in parsed.into_iter().enumerate() {
+        let mut cat_kept = Vec::new();
+        let mut seen_in_cat: std::collections::HashSet<String> = std::collections::HashSet::new();
+
+        for item in items {
+            let domain = extract_domain(&item.url);
+            if let Some(ref d) = domain {
+                if seen_in_cat.contains(d) {
+                    dropped.push((cat_idx, item));
+                    continue;
+                }
+                seen_in_cat.insert(d.clone());
+                *domain_counts.entry(d.clone()).or_insert(0) += 1;
+            }
+            cat_kept.push(item);
+        }
+
+        kept.push((cat_key, cat_kept));
+    }
+
+    // Cap enforcement: if any domain exceeds max after pass 1 (when categories > max),
+    // keep the first max articles in category order, drop the rest.
+    let mut cap_counts: std::collections::HashMap<String, usize> = std::collections::HashMap::new();
+    for (_, items) in &mut kept {
+        items.retain(|item| {
+            let domain = extract_domain(&item.url);
+            match domain {
+                Some(ref d) => {
+                    let count = cap_counts.entry(d.clone()).or_insert(0);
+                    if *count >= max {
+                        false
+                    } else {
+                        *count += 1;
+                        true
+                    }
+                }
+                None => true, // keep unparseable URLs
+            }
+        });
+    }
+
+    // Use cap_counts as the authoritative domain counts going forward
+    let mut domain_counts = cap_counts;
+
+    // Pass 2: fill from dropped articles, back into their original category
+    for (cat_idx, item) in dropped {
+        if let Some(d) = extract_domain(&item.url) {
+            let count = domain_counts.get(&d).copied().unwrap_or(0);
+            if count < max {
+                *domain_counts.entry(d).or_insert(0) += 1;
+                kept[cat_idx].1.push(item);
+            }
+        } else {
+            // Unparseable URL — keep it
+            kept[cat_idx].1.push(item);
+        }
+    }
+
+    kept
+}
+
+/// Extract the domain (host) from a URL, or None if unparseable.
+fn extract_domain(url: &str) -> Option<String> {
+    url::Url::parse(url)
+        .ok()
+        .and_then(|u| u.host_str().map(|h| h.to_lowercase()))
+}
+```
+
+- [ ] **Step 2: Wire it into the pipeline**
+
+In `run_generation_inner`, after `filter_homepage_urls` (line 315) and before the scrape step, add:
+
+```rust
+    // Step 7c: Limit articles per source for diversity
+    let parsed = limit_articles_per_source(parsed, settings.max_articles_per_source);
+```
+
+- [ ] **Step 3: Add unit tests**
+
+Add to the `#[cfg(test)] mod tests` block at the bottom of `synthesis.rs`:
+
+```rust
+    // ── limit_articles_per_source tests ────────────────────────────
+
+    #[test]
+    fn source_limit_spreads_across_categories() {
+        let parsed = vec![
+            ("category_0".into(), vec![
+                NewsItem { title: "A1".into(), url: "https://openai.com/blog/a".into(), summary: "s".into() },
+                NewsItem { title: "A2".into(), url: "https://openai.com/blog/b".into(), summary: "s".into() },
+                NewsItem { title: "A3".into(), url: "https://openai.com/blog/c".into(), summary: "s".into() },
+                NewsItem { title: "A4".into(), url: "https://techcrunch.com/x".into(), summary: "s".into() },
+            ]),
+            ("category_1".into(), vec![
+                NewsItem { title: "B1".into(), url: "https://openai.com/research/d".into(), summary: "s".into() },
+                NewsItem { title: "B2".into(), url: "https://openai.com/research/e".into(), summary: "s".into() },
+                NewsItem { title: "B3".into(), url: "https://theverge.com/y".into(), summary: "s".into() },
+            ]),
+        ];
+
+        let result = limit_articles_per_source(parsed, 3);
+
+        // Count openai.com articles across all categories
+        let openai_count: usize = result.iter()
+            .flat_map(|(_, items)| items)
+            .filter(|i| i.url.contains("openai.com"))
+            .count();
+        assert_eq!(openai_count, 3, "Should keep exactly 3 openai.com articles");
+
+        // Both categories should have at least 1 openai article (spread)
+        let cat0_openai = result[0].1.iter().filter(|i| i.url.contains("openai.com")).count();
+        let cat1_openai = result[1].1.iter().filter(|i| i.url.contains("openai.com")).count();
+        assert!(cat0_openai >= 1, "Category 0 should have at least 1 openai article");
+        assert!(cat1_openai >= 1, "Category 1 should have at least 1 openai article");
+
+        // techcrunch and theverge should be untouched
+        let tc_count: usize = result.iter().flat_map(|(_, items)| items).filter(|i| i.url.contains("techcrunch")).count();
+        assert_eq!(tc_count, 1);
+    }
+
+    #[test]
+    fn source_limit_all_different_domains() {
+        let parsed = vec![
+            ("category_0".into(), vec![
+                NewsItem { title: "A".into(), url: "https://a.com/1".into(), summary: "s".into() },
+                NewsItem { title: "B".into(), url: "https://b.com/2".into(), summary: "s".into() },
+            ]),
+        ];
+
+        let result = limit_articles_per_source(parsed, 3);
+        assert_eq!(result[0].1.len(), 2, "Nothing dropped when all domains are unique");
+    }
+
+    #[test]
+    fn source_limit_max_one() {
+        let parsed = vec![
+            ("category_0".into(), vec![
+                NewsItem { title: "A".into(), url: "https://openai.com/a".into(), summary: "s".into() },
+                NewsItem { title: "B".into(), url: "https://openai.com/b".into(), summary: "s".into() },
+            ]),
+            ("category_1".into(), vec![
+                NewsItem { title: "C".into(), url: "https://openai.com/c".into(), summary: "s".into() },
+            ]),
+        ];
+
+        let result = limit_articles_per_source(parsed, 1);
+        let total: usize = result.iter().flat_map(|(_, items)| items).filter(|i| i.url.contains("openai.com")).count();
+        assert_eq!(total, 1, "max=1 should keep exactly 1 openai article");
+    }
+
+    #[test]
+    fn source_limit_more_categories_than_max() {
+        // 5 categories, each with 1 openai article, max=2
+        let parsed: Vec<(String, Vec<NewsItem>)> = (0..5)
+            .map(|i| (
+                format!("category_{}", i),
+                vec![NewsItem {
+                    title: format!("Art{}", i),
+                    url: format!("https://openai.com/{}", i),
+                    summary: "s".into(),
+                }],
+            ))
+            .collect();
+
+        let result = limit_articles_per_source(parsed, 2);
+        let total: usize = result.iter().flat_map(|(_, items)| items).count();
+        assert_eq!(total, 2, "Should cap at max_per_source even with more categories");
+    }
+
+    #[test]
+    fn source_limit_empty_input() {
+        let result = limit_articles_per_source(vec![], 3);
+        assert!(result.is_empty());
+    }
+
+    #[test]
+    fn source_limit_unparseable_urls_kept() {
+        let parsed = vec![
+            ("category_0".into(), vec![
+                NewsItem { title: "Good".into(), url: "https://openai.com/a".into(), summary: "s".into() },
+                NewsItem { title: "Bad".into(), url: "not-a-url".into(), summary: "s".into() },
+            ]),
+        ];
+
+        let result = limit_articles_per_source(parsed, 3);
+        assert_eq!(result[0].1.len(), 2, "Unparseable URLs should be kept");
+    }
+```
+
+- [ ] **Step 4: Run tests**
+
+Run: `cd backend && cargo test --lib`
+Expected: all tests pass including the 6 new ones
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add backend/src/services/synthesis.rs
+git commit -m "feat: add limit_articles_per_source filter with unit tests"
+```
+
+---
+
+### Task 4: Frontend setting
+
+**Files:**
+- Modify: `frontend/src/types.ts`
+- Modify: `frontend/src/i18n/fr.ts`
+- Modify: `frontend/src/pages/Settings.tsx`
+
+- [ ] **Step 1: Add field to frontend types**
+
+In `frontend/src/types.ts`, add to `UserSettings` interface after `max_items_per_category`:
+```typescript
+max_articles_per_source: number;
+```
+
+- [ ] **Step 2: Add i18n label**
+
+In `frontend/src/i18n/fr.ts`, add after the `settings.maxItems` line:
+```typescript
+'settings.maxArticlesPerSource': 'Articles max par source',
+```
+
+- [ ] **Step 3: Add number input to Settings page**
+
+In `frontend/src/pages/Settings.tsx`, inside the `sm:grid-cols-2` grid (before its closing `</div>` around line 403), add a new `<div>` as a third child of the grid:
+
+```tsx
+            <div>
+              <label
+                for="maxArticlesPerSource"
+                class="block text-sm font-medium text-gray-700"
+              >
+                {t('settings.maxArticlesPerSource')}
+              </label>
+              <div class="mt-1">
+                <input
+                  type="number"
+                  id="maxArticlesPerSource"
+                  min="1"
+                  max="10"
+                  class="shadow-sm focus:ring-indigo-500 focus:border-indigo-500 block w-full sm:text-sm border-gray-300 rounded-md py-2 px-3 border"
+                  value={settings().max_articles_per_source}
+                  onInput={(e) =>
+                    setSettings((prev) => ({
+                      ...prev,
+                      max_articles_per_source:
+                        parseInt(e.currentTarget.value) || 3,
+                    }))
+                  }
+                />
+              </div>
+            </div>
+```
+
+Also add `max_articles_per_source: 3` to the default settings initializer if one exists.
+
+- [ ] **Step 4: Run frontend tests and type check**
+
+Run: `cd frontend && npx tsc --noEmit && npx vitest run`
+Expected: type check passes, all tests pass
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add frontend/src/types.ts frontend/src/i18n/fr.ts frontend/src/pages/Settings.tsx
+git commit -m "feat: add max_articles_per_source setting to frontend"
+```
+
+---
+
+### Task 5: E2E verification
+
+- [ ] **Step 1: Rebuild and run Docker stack**
+
+```bash
+docker compose down && docker compose up --build
+```
+
+- [ ] **Step 2: Verify the setting appears in the Settings page**
+
+Navigate to Settings, verify the "Articles max par source" number input is visible with default value 3.
+
+- [ ] **Step 3: Generate a synthesis and verify source diversity**
+
+Change the setting to 2, generate a synthesis, verify no domain appears more than 2 times across all categories.
diff --git a/docs/superpowers/specs/2026-03-23-source-diversity-history-design.md b/docs/superpowers/specs/2026-03-23-source-diversity-history-design.md
new file mode 100644
index 0000000..c346ef1
--- /dev/null
+++ b/docs/superpowers/specs/2026-03-23-source-diversity-history-design.md
@@ -0,0 +1,83 @@
+# Design: Source Diversity via Recent History
+
+**Date**: 2026-03-23
+**Scope**: Inject recently-used domains into the search prompt to encourage source diversity across syntheses
+
+---
+
+## Context
+
+Users notice that successive syntheses reuse the same sources (TechCrunch, The Verge, etc.). Within a single synthesis, the `limit_articles_per_source` filter already caps per-domain articles. But across syntheses over time, the LLM gravitates toward the same popular domains. By telling the LLM which domains were recently used, it can prioritize different sources.
+
+## New User Setting
+
+- **Field:** `source_diversity_window` in `UserSettings`
+- **Type:** `i32` (non-optional, matches existing pattern)
+- **Default:** 3
+- **Validation:** 0-10 (0 = disabled)
+- **Migration:** `ALTER TABLE settings ADD COLUMN source_diversity_window INTEGER NOT NULL DEFAULT 3`
+- **Frontend label:** "Syntheses a examiner pour diversite"
+
+## Mechanism
+
+1. At generation time, if `source_diversity_window > 0`, query the user's last N syntheses from the DB (ordered by `created_at DESC`, limit N).
+2. Parse the `sections` JSONB from each synthesis, extract all article URLs, convert to domains via `host_str()`.
+3. Deduplicate the domain list.
+4. Pass the domain list to `build_search_prompt`, which appends a soft instruction:
+   "Evite si possible les sources deja utilisees recemment : domaine1.com, domaine2.com, ..."
+5. The LLM treats this as guidance, not a hard constraint — if no alternative exists, it can still use those domains.
+
+## Files to modify
+
+- **Create:** migration `20260323000013_add_source_diversity_window.sql`
+- **Modify:** `backend/src/models/settings.rs` — add field to `UserSettings`, `SettingsResponse`, `UpdateSettingsRequest` + `Default` impl + validation (0-10)
+- **Modify:** `backend/src/db/settings.rs` — add to `SettingsRow` struct, `TryFrom<SettingsRow>` impl, and both SQL queries (`get_or_create_default` + `upsert`: INSERT columns, VALUES, RETURNING, ON CONFLICT SET, .bind())
+- **Modify:** `backend/src/services/synthesis.rs` — before calling `build_search_prompt`, load recent syntheses via existing `db::syntheses::list_for_user`, extract domains using `extract_domain` (same module, private fn), pass domain list to the prompt builder
+- **Modify:** `backend/src/services/prompts.rs` — add `recent_domains: &[String]` parameter to `build_search_prompt`, append soft avoidance instruction if non-empty. Update the call site in `synthesis.rs` (~line 304) to pass the domain list as the 4th argument.
+- **Modify:** `backend/src/services/prompts.rs` tests — add `source_diversity_window` to test fixture, test with/without recent domains
+- **Modify:** `frontend/src/types.ts` — add field to `UserSettings` + `DEFAULT_SETTINGS`
+- **Modify:** `frontend/src/i18n/fr.ts` — add label
+- **Modify:** `frontend/src/pages/Settings.tsx` — add number input
+
+**Note:** No new DB query function needed — the existing `db::syntheses::list_for_user(pool, user_id, limit, offset)` already returns full `Synthesis` records with `sections` JSONB. For a window of 3-10 syntheses (15-150 KB of JSON), application-level domain extraction is pragmatically fine for a single-tenant deployment.
+
+## Domain extraction from existing syntheses
+
+The `sections` column is JSONB with structure:
+```json
+[
+  {
+    "title": "Category Name",
+    "items": [
+      { "title": "...", "url": "https://example.com/article", "summary": "..." }
+    ]
+  }
+]
+```
+
+Extract domains by parsing each item's `url` with `url::Url::parse` and `host_str()`. Reuse the existing `extract_domain` function in `synthesis.rs` (private fn, same module).
+
+## Unit tests
+
+- `build_search_prompt` with non-empty `recent_domains` → prompt contains avoidance instruction
+- `build_search_prompt` with empty `recent_domains` → prompt unchanged
+- Validation of `source_diversity_window` bounds (0 and 10 pass, -1 and 11 fail)
+
+## Prompt modification
+
+In `build_search_prompt`, add an optional parameter `recent_domains: &[String]`. If non-empty, append to the user prompt:
+
+```
+Evite si possible les sources deja utilisees dans les syntheses precedentes : domaine1.com, domaine2.com, ...
+```
+
+This is a soft instruction — the LLM can still use these domains if no alternatives are available.
+
+## What does NOT change
+
+- JSON schema — no changes
+- Scraper — no changes
+- Rewrite pass — no changes
+- `limit_articles_per_source` — still enforces hard cap within a single synthesis
+- `dedup_by_url` — still deduplicates within a single synthesis
+- No new database table — domains are extracted from existing `syntheses.sections` JSONB
diff --git a/docs/superpowers/specs/2026-03-23-source-diversity-limit-design.md b/docs/superpowers/specs/2026-03-23-source-diversity-limit-design.md
new file mode 100644
index 0000000..c699fe4
--- /dev/null
+++ b/docs/superpowers/specs/2026-03-23-source-diversity-limit-design.md
@@ -0,0 +1,104 @@
+# Design: Source Diversity Limit (max articles per source)
+
+**Date**: 2026-03-23
+**Scope**: Limit the number of articles from the same website across all categories in a synthesis
+
+---
+
+## Context
+
+Generated syntheses can be dominated by a single source (e.g., 8 articles from openai.com across categories). Users want source diversity — at most N articles from the same website, with articles spread across categories rather than clustered in one.
+
+## Approach
+
+Add a post-parse filter function that enforces a per-domain article limit after the LLM search pass and before scraping. A new user setting controls the limit.
+
+## New User Setting
+
+- **Field:** `max_articles_per_source` in `UserSettings`
+- **Type:** `i32` (non-optional, matches `max_items_per_category` pattern)
+- **Validation:** 1-10
+- **Migration:** `ALTER TABLE user_settings ADD COLUMN max_articles_per_source INTEGER NOT NULL DEFAULT 3`
+- **Frontend label:** "Articles max par source"
+- **Note:** 10 effectively means "no practical limit for most use cases"
+
+## Filter Function
+
+**Name:** `limit_articles_per_source`
+
+**Signature:** `fn limit_articles_per_source(parsed: Vec<(String, Vec<NewsItem>)>, max_per_source: i32) -> Vec<(String, Vec<NewsItem>)>`
+
+**Pipeline position:** after `filter_homepage_urls`, before `scrape_articles`
+
+**Domain extraction:** Parse URL with `url::Url`, extract via `host_str()` (e.g., `https://openai.com/blog/post` → `openai.com`). If URL can't be parsed, keep the article (don't drop on parse failure).
+
+**Known limitation:** Subdomains are treated as different sources (`blog.example.com` ≠ `www.example.com`). This is pragmatic for v1; registrable domain extraction (eTLD+1) can be added later if needed.
+
+**Algorithm:**
+1. **Pass 1 — spread:** For each category (in order), keep at most 1 article per domain. Track the first occurrence of each domain's article; move remaining articles from that domain to a "dropped" list.
+2. **Cap enforcement:** If any domain exceeds `max_per_source` after pass 1 (possible when categories > limit), trim that domain's articles down to `max_per_source`, keeping them spread across categories in order.
+3. **Pass 2 — fill:** Iterate over dropped articles in their original order (categories in order, items within each category in order). Re-add each article to its original category if the domain is still under `max_per_source`.
+4. Return the filtered list (same category keys, fewer items per category).
+
+**Example** with `max_per_source = 3`:
+
+Before:
+- Category A: openai.com×3, techcrunch.com×1
+- Category B: openai.com×2, theverge.com×2
+
+After pass 1 (1 per domain per category):
+- Category A: openai.com×1, techcrunch.com×1
+- Category B: openai.com×1, theverge.com×1
+- Dropped: openai.com×3 (2 from A, 2 from B), theverge.com×1
+- Global: openai=2, techcrunch=1, theverge=1
+
+Cap enforcement: openai=2 ≤ 3, no trimming needed.
+
+After pass 2 (fill up to max, dropped articles re-added to original category):
+- openai has 1 slot left → add 1 openai article back to Category A
+- theverge has 2 slots left → add 1 theverge article back to Category B
+- Final: 3 openai total, 1 techcrunch, 2 theverge
+
+**Edge case** with `max_per_source = 2`, 5 categories all with 1 openai.com article:
+
+After pass 1: 5 openai articles (1 per category) → exceeds limit of 2.
+Cap enforcement: trim to 2 openai articles, keeping categories A and B (first two in order), dropping C/D/E.
+Pass 2: no dropped openai articles to re-add (already at limit).
+
+## Integration
+
+```
+parse_llm_output → filter_homepage_urls → limit_articles_per_source → scrape_articles
+```
+
+Call site in `run_generation_inner`:
+```rust
+let parsed = limit_articles_per_source(parsed, settings.max_articles_per_source);
+```
+
+## Files to modify
+
+- **Create:** migration `20260323000012_add_max_articles_per_source.sql`
+- **Modify:** `backend/src/models/settings.rs` — add field to `UserSettings`, `SettingsResponse`, `UpdateSettingsRequest` + validation
+- **Modify:** `backend/src/db/settings.rs` — add column to all SQL queries + `SettingsRow`
+- **Modify:** `backend/src/services/synthesis.rs` — add filter function + call it
+- **Modify:** `frontend/src/pages/Settings.tsx` — add number input in the generation settings grid
+- **Modify:** `frontend/src/i18n/fr.ts` — add label translation
+- **Modify:** `frontend/src/types.ts` — add field to Settings type
+
+## Unit tests
+
+In `synthesis.rs` tests:
+- 5 openai.com articles across 2 categories, max=3 → keeps 3, spread across categories
+- All articles from different domains → nothing dropped
+- `max_per_source = 1` → at most 1 per domain total
+- More categories than max (5 categories, 1 openai each, max=2) → caps at 2
+- Empty input → empty output
+- Articles with unparseable URLs → kept
+
+## What does NOT change
+
+- LLM prompts — no instruction about source diversity
+- JSON schema — no changes
+- Scraper — no changes
+- Rewrite pass — operates on already-filtered articles