diff --git a/docs/superpowers/specs/2026-03-25-pipeline-improvements-design.md b/docs/superpowers/specs/2026-03-25-pipeline-improvements-design.md
new file mode 100644
index 0000000..ab6d35a
--- /dev/null
+++ b/docs/superpowers/specs/2026-03-25-pipeline-improvements-design.md
@@ -0,0 +1,87 @@
+# Design: Pipeline Improvements — Web Search, LLM Logs, Link Extraction
+
+**Date**: 2026-03-25
+**Scope**: Three independent improvements to the synthesis pipeline
+
+---
+
+## 1. Remove personalized sources from web search prompt
+
+### Context
+
+`build_search_prompt` receives `&[Source]` and injects personalized source URLs into the Phase 2 web search prompt. Phase 1 already handles personalized sources via scraping, so including them again in Phase 2 biases the web search away from discovering new content.
+
+### Change
+
+In `synthesis.rs`, pass `&[]` instead of `&sources` when calling `build_search_prompt` for Phase 2. The function signature is unchanged — Phase 2 will do a pure Google search based on theme, categories, and gap counts only.
+
+### Files to modify
+
+- `backend/src/services/synthesis.rs` — pass `&[]` for sources in the Phase 2 `build_search_prompt` call
+
+---
+
+## 2. Add `article_url` to LLM call logs
+
+### Context
+
+The `llm_call_log` table records every LLM call during synthesis generation but has no field linking a `classify_summarize` call to the specific article URL being classified. To see which article a classify call relates to, you must cross-reference `article_history` — cumbersome for debugging.
+
+### Changes
+
+**Migration:** Add nullable `article_url TEXT` column to `llm_call_log`.
+
+**Backend:**
+- `llm_call_log::insert` — add `article_url: Option<&str>` parameter, bind it in the INSERT
+- `LlmCallLogRow` — add `article_url: Option<String>` field, update SELECT in `list_by_job_id` to include `article_url`
+- `log_llm_call` helper in `synthesis.rs` — add `article_url: Option<&str>` parameter, pass through to `insert`
+- The `classify_summarize` call in synthesis.rs calls `insert` directly (not via `log_llm_call`) — update it to pass the article URL
+- The `link_extraction` call in `source_scraper.rs` also calls `insert` directly — update it to pass `None`
+- All other call sites via `log_llm_call` (`search`) pass `None`
+
+**Frontend:**
+- `LlmCallLogEntry` type — add `article_url: string | null`
+- `LlmLogs.tsx` — display the URL as a clickable link when present
+- `fr.ts` — add `'llmLogs.articleUrl': 'Article'`
+
+### Files to modify
+
+- **Create:** `backend/migrations/20260325000021_add_article_url_to_llm_log.sql`
+- **Modify:** `backend/src/db/llm_call_log.rs` — insert signature, row struct, SELECT queries
+- **Modify:** `backend/src/services/synthesis.rs` — pass article URL in classify `insert` call, update `log_llm_call` helper
+- **Modify:** `backend/src/services/source_scraper.rs` — update `insert` call to pass `None`
+- **Modify:** `frontend/src/types.ts` — add field to `LlmCallLogEntry`
+- **Modify:** `frontend/src/pages/LlmLogs.tsx` — display article URL
+- **Modify:** `frontend/src/i18n/fr.ts` — add label
+- **Modify:** `CLAUDE.md` — migration count
+
+---
+
+## 3. Send structured link pairs to LLM instead of raw HTML body
+
+### Context
+
+The LLM link extraction path (`extract_article_links_with_llm`) sends the first 12000 chars of the HTML `<body>` to the LLM. This is noisy — the LLM must parse raw HTML with scripts, styles, and irrelevant markup, wasting tokens and reducing accuracy.
+
+### Changes
+
+**New function:** `extract_links_as_pairs(html: &str, base_url: &Url) -> Vec<(String, String)>` in `source_scraper.rs`. Parses all `<a href>` tags and returns `(resolved_href, anchor_text)` pairs. Filtering: http/https only, same-domain, non-empty path. No dedup or article-pattern filtering (the LLM decides). Same-domain filtering is kept to avoid sending irrelevant cross-domain links that waste tokens.
+
+**Updated flow in `extract_article_links_with_llm`:**
+1. Fetch the page HTML (unchanged)
+2. Call `extract_links_as_pairs` instead of `extract_body_html`
+3. Format pairs as a text list: `- /blog/article-1 | "OpenAI launches GPT-6"` (capped at 200 links)
+4. Pass the formatted list to `build_link_extraction_prompt`
+
+**Updated prompt:** `build_link_extraction_prompt` parameter renamed from `body_html` to `links_text`. Remove the internal 12000-char truncation (the input is now a pre-formatted list, not raw HTML; the 200-link cap controls size). Update prompt wording to ask the LLM to select article links from the list rather than extract URLs from HTML.
+
+**Schema:** `build_link_extraction_schema` returns `{ "urls": [...] }` — unchanged. The LLM now selects URLs from the provided list rather than extracting from HTML, but the output format stays the same.
+
+**Cleanup:** Remove `extract_body_html` and its tests if no longer used elsewhere.
+
+### Files to modify
+
+- **Modify:** `backend/src/services/source_scraper.rs` — add `extract_links_as_pairs`, update `extract_article_links_with_llm`, remove `extract_body_html`
+- **Modify:** `backend/src/services/prompts.rs` — update `build_link_extraction_prompt` (rename parameter, remove truncation, update wording)
+- **Modify:** `backend/src/services/source_scraper.rs` tests — add tests for `extract_links_as_pairs`, remove `extract_body_html` tests
+- **Modify:** `backend/src/services/prompts.rs` tests — update link extraction prompt tests