docs: remove redundancy across documentation — cross-references instead of duplication

Trim architecture.md significantly (section 1 overview, technology stack, deployment topology, module inventory lists, LLM trait block, pipeline details, data model table, full API tables, background task list). Replace section 5 API tables with a one-liner. Requirements.md sections 3.1/3.5/3.6/3.7/3.8 and 4.2 condensed with cross-references. deployment.md security feature list replaced by cross-reference to architecture.md Section 6. functional_specs.md Section 3 gains a cross-reference to technical_specs.md Section 5. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago · 58f42d0a87
parent 7835725fe8
commit 58f42d0a87
7 changed files with 85 additions and 285 deletions
--- a/docs/architecture.md
+++ b/docs/architecture.md
@ -2,30 +2,9 @@
 ## 1. System Overview
-AI Weekly Synth is a self-hosted web application that generates AI-powered weekly news syntheses. Users configure topics (themes), categories, and an LLM provider; the system then searches the web, scrapes and validates sources, classifies articles, and produces structured summaries.
+AI Weekly Synth is a self-hosted Rust/Axum backend with a SolidJS frontend, backed by PostgreSQL, deployed as a Docker Compose stack. It generates AI-powered weekly news syntheses organized by user-configured themes and categories.
-### Technology Stack
+See `requirements.md` for product vision and features. See `technical_specs.md` for the full technology stack. See `deployment.md` for the Docker topology and operational details.
 | Layer | Technology |
 |---|---|
 | Backend | Rust (Axum 0.8) |
 | Frontend | SolidJS 1.9 + Tailwind CSS v4 |
 | Database | PostgreSQL 17 (via sqlx with compile-time query checking) |
 | Deployment | Docker Compose (app + Postgres) |
 ### Deployment Topology
 ```
 docker-compose.yml
  ├── app  (ai-synth)       port 8080
  │     ├── Axum HTTP server
  │     ├── Static file serving (SPA fallback)
  │     └── Background tasks (scheduler, session cleanup, job TTL)
  └── db   (postgres:17-alpine)  port 5432 (localhost only)
        └── postgres_data volume
 ```
 The app container builds from a multi-stage Dockerfile, serves the SolidJS frontend as static files, and connects to Postgres over the `internal` bridge network.
 ---
@ -36,42 +15,16 @@ The backend follows a three-layer architecture with shared model types:
 ```
 handlers/  (HTTP layer)
    │
    ├── extracts request data (Axum extractors, JSON, path params)
    ├── validates input
    ├── calls services/ or db/ directly
    └── formats HTTP responses
    │
 services/  (Business logic)
    │
    ├── synthesis pipeline orchestration
    ├── LLM provider abstraction + factory
    ├── scraping (articles, source pages)
    ├── encryption, email, CSV, PDF export
    ├── rate limiting, job store, scheduler
    └── Brave Search client
    │
 db/  (Data access)
    │
    ├── pure SQL queries via sqlx
    ├── typed result mapping (FromRow)
    └── no business logic
    │
 models/  (Shared types -- used by all layers)
    │
    ├── domain structs (User, Theme, Source, Synthesis, etc.)
    ├── request/response DTOs
    └── validation logic
 ```
-### Module Inventory
+Handlers extract and validate request data, delegate to services or db, and format responses. Services contain all business logic. The db layer executes pure SQL via sqlx with typed result mapping and no business logic. Models define domain structs, request/response DTOs, and validation logic.
 **Handlers** (`handlers/`): `admin`, `api_keys`, `article_history`, `auth`, `config`, `generation`, `health`, `llm_logs`, `schedules`, `settings`, `sources`, `syntheses`, `themes`
 **Services** (`services/`): `auth`, `brave_search`, `csv`, `email`, `encryption`, `export`, `job_store`, `llm` (with `gemini`, `openai`, `anthropic`, `mock`, `factory`, `schema`), `prompts`, `rate_limiter`, `scheduler`, `scraper`, `source_scraper`, `synthesis`, `turnstile`
-**DB** (`db/`): `api_keys`, `article_history`, `audit`, `llm_call_log`, `magic_links`, `providers`, `rate_limits`, `schedules`, `sessions`, `settings`, `sources`, `syntheses`, `themes`, `users`
+See `dev_guidelines.md` Section 2 for complete project structure.
 **Models** (`models/`): `api_key`, `audit`, `magic_link`, `provider`, `rate_limit`, `schedule`, `session`, `settings`, `source`, `synthesis`, `theme`, `user`
 ---
@ -79,35 +32,15 @@ models/  (Shared types -- used by all layers)
 ### 3.1 LLM Provider Abstraction
-The `LlmProvider` trait defines a unified interface for all LLM backends:
+The `LlmProvider` trait defines a unified interface for all LLM backends, with implementations for Gemini, OpenAI, Anthropic, and a mock provider for testing. A factory creates provider instances by name from the admin-curated provider list.
 ```rust
 #[async_trait]
 pub trait LlmProvider: Send + Sync {
    fn provider_id(&self) -> &str;
    async fn call_llm(&self, model: &str, system_prompt: &str,
                       user_prompt: &str, response_schema: &Value)
        -> Result<Value, AppError>;
 }
 ```
 Implementations: `GeminiProvider`, `OpenAiProvider`, `AnthropicProvider`, `MockLlmProvider`.
-The factory (`llm/factory.rs`) creates provider instances by name. The mock provider enables end-to-end pipeline testing without real API calls.
+See `technical_specs.md` Section 6 for provider interface details and supported models.
 ### 3.2 Synthesis Pipeline
-The pipeline is the core business logic, orchestrated in `services/synthesis.rs`. It runs as a background tokio task with a 15-minute timeout.
+The pipeline is orchestrated in `services/synthesis.rs` and runs as a background tokio task with a 15-minute timeout. Phase 1 processes the user's personalized sources using a rolling windowed extraction with batched parallel scraping and LLM classification. Phase 2 fills remaining category gaps via Brave Search or LLM web search. The finalization step assembles sections, persists the synthesis, and records article history. Progress is reported via `tokio::sync::watch` channels consumed by SSE endpoints.
 **Three phases:**
 1. **Phase 1 -- Personalized Sources**: Extract article links from user-curated source pages (windowed, rolling), scrape articles, classify and summarize each via LLM. Batched processing with configurable `batch_size`.
-2. **Phase 2 -- Web Search Fallback**: For under-filled categories, either call the Brave Search API or use the LLM's web search capability to find additional articles. Scrape and validate results.
+See `technical_specs.md` Section 5 for the full algorithm.
 3. **Save**: Assemble sections by category, sanitize JSON, persist to database, record article history traces.
 Progress is reported via `tokio::sync::watch` channels consumed by SSE endpoints.
 ### 3.3 Job Store
@ -121,20 +54,16 @@ Progress is reported via `tokio::sync::watch` channels consumed by SSE endpoints
 ### 3.4 Scheduler
-`services/scheduler.rs` runs as a background task, checking every minute for due `theme_schedules`. When a schedule fires:
+`services/scheduler.rs` runs as a background task checking every minute for due `theme_schedules`. When a schedule fires it runs the generation pipeline directly, emails results to configured recipients (up to 3), and marks the schedule as run to prevent double-execution on the same day.
-1. Query `find_due_schedules` matching current day code + time
+See `deployment.md` for operational details.
 2. Skip if user already has a manual generation in progress
 3. Run `synthesis::run_generation_inner` directly
 4. Send email to configured recipients (up to 3)
 5. Mark schedule as run
 ### 3.5 Scraper
 Two scraping services:
 - **`scraper.rs`**: Article page scraper with SSRF prevention, HTML parsing, title/date/body extraction, soft-404 detection, 15s timeout, 5MB body limit.
- **`source_scraper.rs`**: Source index page scraper that extracts article links from user-configured source URLs (HTML `<a>` parsing with filters, or LLM-assisted extraction).
+- **`source_scraper.rs`**: Source index page scraper that extracts article links from user-configured source URLs (HTML `<a>` parsing with filters).
 ### 3.6 Rate Limiters
@ -166,130 +95,13 @@ admin_providers
  └── admin_rate_limits (provider_name FK, CASCADE)
 ```
-### Table Summary
+See `technical_specs.md` Section 3 for complete column definitions.
 | Table | Purpose | Key Columns |
 |---|---|---|
 | `users` | User accounts | id, email, display_name, role (user/admin), created_at |
 | `sessions` | Login sessions | session_hash (PK), user_id, expires_at, last_active_at, ip_address |
 | `magic_tokens` | Passwordless auth tokens | id, email, token_hash, expires_at, used |
 | `settings` | Per-user pipeline config | user_id (PK), ai_provider, ai_model, ai_model_websearch, batch_size, max_articles_per_source, max_links_per_source, use_brave_search, source_extraction_window, article_history_days, search_agent_behavior, rate_limit_max_requests, rate_limit_time_window_seconds |
 | `themes` | Per-user topic configurations | id, user_id, name, theme, categories (JSONB), max_items_per_category, max_age_days, summary_length |
 | `sources` | User-curated news source URLs | id, user_id, title, url, theme_id, is_preferred |
 | `syntheses` | Generated synthesis results | id, user_id, week, sections (JSONB), status, job_id, theme_id |
 | `theme_schedules` | Automated generation schedules | id, theme_id (UNIQUE), user_id, enabled, days (JSONB), time_utc, emails (JSONB), last_run_at |
 | `article_history` | Article URL dedup + provenance trace | id, user_id, url, url_hash, title, source_type, source_url, category, synthesis_id, status, scraped_ok, job_id, published_date |
 | `llm_call_log` | Full LLM interaction log | id, user_id, job_id, call_type, model, system_prompt, user_prompt, response_body, duration_ms, article_url |
 | `admin_providers` | Admin-curated LLM provider catalog | id, provider_name (UNIQUE), display_name, models_scraping (JSONB), models_websearch (JSONB), is_enabled |
 | `admin_rate_limits` | Per-provider rate limit config | id, provider_name (UNIQUE, FK), max_requests, time_window_seconds |
 | `user_api_keys` | Encrypted user LLM API keys | id, user_id, provider_name, encrypted_key (BYTEA), nonce (BYTEA), key_prefix; UNIQUE(user_id, provider_name) |
 | `audit_log` | Admin mutation audit trail | id, admin_user_id, action, target_type, target_id, details (JSONB) |
 ---
 ## 5. API Overview
-All API routes are prefixed with `/api/v1`. CSRF protection (`X-Requested-With` header) is applied to all mutating endpoints.
+See `technical_specs.md` Section 4 for complete API endpoint specifications.
 ### Authentication
 | Method | Path | Auth | Description |
 |---|---|---|---|
 | POST | /auth/register | Public | Create account + send magic link |
 | POST | /auth/login | Public | Request magic link |
 | GET | /auth/verify | Public | Verify token (email click redirect) |
 | POST | /auth/verify | Public | Verify token (frontend API call) |
 | POST | /auth/logout | Authenticated | Destroy session |
 | GET | /auth/me | Authenticated | Current user info |
 ### Settings
 | Method | Path | Auth | Description |
 |---|---|---|---|
 | GET | /settings | Authenticated | Get user settings |
 | PUT | /settings | Authenticated | Update user settings |
 ### Themes
 | Method | Path | Auth | Description |
 |---|---|---|---|
 | GET | /themes | Authenticated | List user themes |
 | POST | /themes | Authenticated | Create theme |
 | PUT | /themes/{id} | Authenticated | Update theme |
 | DELETE | /themes/{id} | Authenticated | Delete theme |
 ### Schedules
 | Method | Path | Auth | Description |
 |---|---|---|---|
 | GET | /themes/{id}/schedule | Authenticated | Get theme schedule |
 | PUT | /themes/{id}/schedule | Authenticated | Create or update schedule |
 | DELETE | /themes/{id}/schedule | Authenticated | Delete schedule |
 ### Sources
 | Method | Path | Auth | Description |
 |---|---|---|---|
 | GET | /sources | Authenticated | List sources |
 | POST | /sources | Authenticated | Create source |
 | PUT | /sources/preferred | Authenticated | Update preferred sources |
 | DELETE | /sources/{id} | Authenticated | Delete source |
 | POST | /sources/bulk | Authenticated | Bulk import (JSON) |
 | POST | /sources/import-csv | Authenticated | Import from CSV |
 | GET | /sources/export-csv | Authenticated | Export as CSV |
 ### Syntheses & Generation
 | Method | Path | Auth | Description |
 |---|---|---|---|
 | GET | /syntheses | Authenticated | List syntheses |
 | GET | /syntheses/{id} | Authenticated | Get full synthesis |
 | DELETE | /syntheses/{id} | Authenticated | Delete synthesis |
 | POST | /syntheses/generate | Authenticated | Trigger generation |
 | GET | /syntheses/generate/{job_id}/progress | Authenticated | SSE progress stream |
 | POST | /syntheses/generate/{job_id}/stop | Authenticated | Cancel generation |
 | POST | /syntheses/{id}/send-email | Authenticated | Email synthesis |
 | GET | /syntheses/{id}/export/markdown | Authenticated | Markdown download |
 | GET | /syntheses/{id}/export/pdf | Authenticated | PDF download |
 ### Article History & LLM Logs
 | Method | Path | Auth | Description |
 |---|---|---|---|
 | GET | /article-history | Authenticated | List article history |
 | DELETE | /article-history | Authenticated | Clear article history |
 | GET | /syntheses/{id}/provenance | Authenticated | Get synthesis provenance |
 | GET | /llm-logs/{job_id} | Authenticated | Get LLM call logs for job |
 ### User API Keys
 | Method | Path | Auth | Description |
 |---|---|---|---|
 | GET | /user/api-keys | Authenticated | List keys (prefix only) |
 | POST | /user/api-keys | Authenticated | Store encrypted key |
 | DELETE | /user/api-keys/{provider} | Authenticated | Delete key |
 | POST | /user/api-keys/{provider}/test | Authenticated | Test key validity |
 | POST | /user/api-keys/export | Authenticated | Export keys |
 ### Configuration & Admin
 | Method | Path | Auth | Description |
 |---|---|---|---|
 | GET | /config/providers | Authenticated | Available providers/models |
 | GET | /admin/providers | Admin | List all providers |
 | POST | /admin/providers | Admin | Create provider |
 | PUT | /admin/providers/{id} | Admin | Update provider |
 | DELETE | /admin/providers/{id} | Admin | Delete provider |
 | GET | /admin/rate-limits | Admin | List rate limits |
 | PUT | /admin/rate-limits/{provider_name} | Admin | Update rate limit |
 | GET | /admin/users | Admin | List users |
 | PUT | /admin/users/{id}/role | Admin | Change user role |
 ### Infrastructure
 | Method | Path | Auth | Description |
 |---|---|---|---|
 | GET | /health | Public | Health check |
 ---
@ -350,10 +162,7 @@ Tokio with full features. The Axum server runs as a multi-threaded async runtime
 ### Background Tasks
-Spawned at startup via `tokio::spawn`:
+Three tasks are spawned at startup: hourly session cleanup, periodic job store TTL cleanup, and the minute-by-minute theme schedule checker. See `deployment.md` Section 2.
 - **Session cleanup**: Hourly deletion of expired DB sessions
 - **Job store cleanup**: Periodic removal of expired job entries (1-hour TTL)
 - **Scheduler**: Minute-by-minute check for due theme schedules
 ### Generation Pipeline Concurrency
@ -380,3 +189,10 @@ POST /generate
 ### Graceful Shutdown
 The server supports graceful shutdown via signal handling, allowing in-flight requests to complete.
 ---
 ## 8. Quality Gates
 - Release candidates must include deterministic CI coverage for critical autonomous flows, especially scheduler execution and SSE progress behavior.
 - External-provider tests (for example live LLM E2E checks) are supplemental and non-blocking; they do not replace deterministic CI coverage.
--- a/docs/deployment.md
+++ b/docs/deployment.md
@ -243,15 +243,4 @@ Before deploying to production, verify:
 ### Security Features (Built-in)
-The application includes the following security measures that require no additional configuration:
+See `architecture.md` Section 6 for detailed security architecture.
 - **AES-256-GCM encryption** for user LLM API keys at rest (per-key random nonces)
 - **SSRF prevention** in the web scraper (DNS resolution checks, private IP blocking, redirect validation)
 - **CSRF protection** via `X-Requested-With` header on all mutating API endpoints
 - **Session cookies**: `HttpOnly`, `SameSite=Lax`, `Secure` (when HTTPS)
 - **Security headers**: CSP, X-Frame-Options (DENY), X-Content-Type-Options (nosniff), Referrer-Policy, HSTS (when HTTPS)
 - **Anti-enumeration**: Same response for existent/non-existent emails in auth flows
 - **Error sanitization**: Internal errors and API key patterns are stripped from client-facing error messages
 - **Rate limiting**: Configurable per-provider rate limits for LLM API calls
 - **Non-root container**: The Docker image runs as `appuser`
 - **Graceful shutdown**: SIGTERM/Ctrl+C triggers clean shutdown with database pool closure
--- a/docs/dev_guidelines.md
+++ b/docs/dev_guidelines.md
@ -179,6 +179,7 @@ async fn admin_handler(admin: AdminUser, State(state): State<AppState>) -> Resul
 #### Component Patterns
 - Use the `Button` component (`components/ui/Button.tsx`) with `variant`/`loading`/`icon` props instead of raw `<button>` elements with inline Tailwind classes.
 - This rule is strict for all frontend UI code (no raw `<button>` in application components).
 - Use `<Switch>/<Match>` for mutually exclusive conditional rendering instead of multiple adjacent `<Show>` blocks.
 - Use `<For each={...}>` for list rendering.
 - Use the `useToast` context for user feedback (success/error notifications).
--- a/docs/functional_specs.md
+++ b/docs/functional_specs.md
@ -24,7 +24,7 @@
 3. The theme form shows:
   - **Name**: a display label for the theme.
   - **Search topic**: the subject the AI uses to search for news (e.g. "Intelligence Artificielle").
-   - **Categories**: an ordered list of user-defined category names. Categories can be added and removed. The system always adds an implicit "Autre" overflow category.
+   - **Categories**: an ordered list of user-defined category names. Themes can be created without user-defined categories. The system always includes `Divers` (overflow) and `Sans date` (undated articles).
   - **Max age (days)**: how old articles can be.
   - **Max items per category**: cap per category.
   - **Summary length**: slider with three positions -- Court (3-4 lines), Moyen (6-8 lines), Detaille (12-15 lines).
@ -34,10 +34,10 @@
 1. On the theme management page, below theme settings, the sources section shows sources scoped to the selected theme.
 2. User adds sources individually (title + URL) or via:
-   - **CSV import**: upload a `.csv` file with `Titre,URL` columns. Auto-detects comma/semicolon delimiters, skips header rows, prepends `https://` to bare URLs.
+   - **CSV import**: upload a `.csv` file with `Titre,URL` columns. Auto-detects comma/semicolon delimiters, skips header rows, prepends `https://` to bare URLs. Import is always applied to the selected theme.
-   - **Bulk text import**: paste multiple sources in `Nom;URL` format, one per line.
+   - **Bulk text import**: paste multiple sources in `Nom;URL` format, one per line. Import is always applied to the selected theme.
-   - **CSV export**: download all sources for the theme as a CSV file.
+   - **CSV export**: download sources for the selected theme only.
-3. Sources can be marked as **preferred** (prioritaire) via checkboxes. Preferred sources are processed first during generation. A counter shows how many sources are preferred.
+3. Sources can be marked as **preferred** (prioritaire) via checkboxes. Preferred sources are scoped per theme and do not affect other themes.
 4. Sources can be deleted individually.
 ### 1.5 Generate a Synthesis
@ -85,13 +85,17 @@ The generate page requires selecting a theme before launching. The home page sho
 ### 2.2 Categories
-Categories are user-defined per theme. Users add and remove category names in the theme editor. The system always appends an implicit "Autre" category to catch articles that do not match any user-defined category, or articles from categories that have reached their max items cap.
+Categories are user-defined per theme. Users add and remove category names in the theme editor after creating a theme.
-If no categories are configured, the only available category is "Autre".
+The system always includes two default categories:
 - `Divers`: overflow category for unmatched or full categories.
 - `Sans date`: category for articles without a usable publication date.
 If no user-defined categories are configured, the available categories are still `Divers` and `Sans date`.
 ### 2.3 Preferred Sources
-Sources can be marked as preferred. During generation, preferred sources are extracted and processed before non-preferred sources. Within each extraction wave, URLs from preferred sources are also shuffled and placed before other URLs. This maximizes the chance that articles from preferred sources fill the synthesis.
+Sources can be marked as preferred. Preference is stored per theme. During generation, preferred sources are extracted and processed before non-preferred sources. Within each extraction wave, URLs from preferred sources are also shuffled and placed before other URLs. This maximizes the chance that articles from preferred sources fill the synthesis.
 ### 2.4 Scheduled Generation
@ -123,7 +127,7 @@ Generation follows a two-phase pipeline. Phase 1 processes the user's personaliz
 ### 3.2 Initialization
 Before generation starts:
-1. Load theme settings (categories, search topic, max items, max age, summary length) and global user settings (provider, models, batch size, rate limits, etc.).
+1. Load theme settings (user-defined categories plus defaults `Divers` and `Sans date`, search topic, max items, max age, summary length) and global user settings (provider, models, batch size, rate limits, etc.).
 2. Decrypt the user's LLM API key and create the provider instance.
 3. Clean up old article history and LLM call logs.
 4. Load personalized sources for the selected theme.
@ -137,7 +141,7 @@ Skipped if the user has no sources for the theme.
 Sources are split into waves of `source_extraction_window` size (default 3). Sources are rotated so extraction starts after the last source used in a previous generation (rolling window). Preferred sources are placed before non-preferred sources within the rotation order.
 For each wave:
-1. Extract article links from all sources in the wave in parallel (bounded concurrency of 5). Link extraction uses either LLM analysis of the page content or HTML `<a>` tag parsing (configurable).
+1. Extract article links from all sources in the wave in parallel (bounded concurrency of 5). Link extraction uses HTML `<a>` tag parsing.
 2. Deduplicate candidate URLs and filter against article history (previously seen articles are skipped).
 3. Shuffle remaining candidates, with URLs from preferred sources placed first.
 4. Process articles in batches of `batch_size`:
@ -166,10 +170,12 @@ The system computes category gaps (how many articles each category still needs),
 ### 3.5 Finalization
 1. If no articles were collected across both phases, return an error.
-2. Order sections: user-defined categories first (in their configured order), then "Autre" if non-empty.
+2. Order sections: user-defined categories first (in their configured order), then `Divers` if non-empty, then `Sans date` if non-empty.
 3. Save the synthesis to the database with status "completed".
 4. Record all used articles in article history for future deduplication.
 For the complete technical algorithm, see `technical_specs.md` Section 5.
 ## 4. Settings Overview
 ### 4.1 Per-Theme Settings
@ -180,7 +186,7 @@ Managed on the theme management page. Each theme has its own values.
 |---------|-------------|---------|
 | Name | Display label for the theme | -- |
 | Search topic | Subject for AI search queries | -- |
-| Categories | Ordered list of category names | [] |
+| Categories | Ordered list of user-defined category names (`Divers` and `Sans date` are always included by the system) | [] |
 | Max age (days) | Article recency filter | 7 |
 | Max items per category | Cap per category | 4 |
 | Summary length | Detail level: 1=Court, 2=Moyen, 3=Detaille | 3 |
--- a/docs/qa_guidelines.md
+++ b/docs/qa_guidelines.md
@ -10,6 +10,14 @@
 | E2E tests (Playwright) | 7 | All passing | `e2e/tests/*.spec.ts` |
 | **Total** | **689** | | |
 ## Release Gate Policy
 - Releases are blocked unless critical flows have deterministic CI coverage.
 - Mandatory deterministic CI coverage includes:
  - Scheduler execution path (due schedule selection, run/skip behavior, `last_run_at` handling, email side effects).
  - SSE generation progress contract.
 - Tests requiring external providers (for example `generation-live.spec.ts`) are non-blocking supplemental checks and must not be the only coverage for critical flows.
 ### Backend Unit Test Breakdown
 | Source file | Tests | Coverage area |
@ -144,7 +152,7 @@ The script:
 6. Runs Playwright tests
 7. Cleans up on exit (stops containers, removes volumes)
-The `generation-live.spec.ts` test requires `OPENAI_TEST_API_KEY` to be set (in `e2e/.env.test` or environment). It exercises the real pipeline with an actual LLM API call.
+The `generation-live.spec.ts` test requires `OPENAI_TEST_API_KEY` to be set (in `e2e/.env.test` or environment). It is a supplemental non-blocking check and does not replace deterministic CI coverage.
 ---
@ -322,7 +330,7 @@ Pipeline integration tests in `pipeline_test.rs` use wiremock + MockLlmProvider:
 4. **Use `createDbClient()`** from `e2e/helpers/auth.ts` when you need to verify database state directly.
-5. **The `generation-live.spec.ts` test** is gated on `OPENAI_TEST_API_KEY`. It exercises the full pipeline including provenance and LLM log verification.
+5. **The `generation-live.spec.ts` test** is gated on `OPENAI_TEST_API_KEY`. Treat it as supplemental coverage only.
 ---
@ -363,6 +371,8 @@ As of the last audit, 10 of 141 frontend unit tests are failing. Investigate wit
 ### Critical Gaps
 The following gaps must be addressed to satisfy the release gate policy.
 | Gap | Priority | Description |
 |-----|----------|-------------|
 | Scheduled execution | Critical | `scheduler.rs` has zero tests. Autonomous process that generates syntheses and sends emails. |
--- a/docs/requirements.md
+++ b/docs/requirements.md
@ -15,10 +15,9 @@ The application is designed for individuals or small teams who want an automated
 ### 3.1 Multi-Theme Support
- Users create multiple themes, each with its own search topic, categories, and content settings.
+Users create multiple independent themes, each with its own search topic, categories, personalized sources, and content settings. Syntheses are generated and tagged per theme. Deleting a theme preserves its existing syntheses.
- Each theme has its own set of personalized sources.
+
- Syntheses are generated per theme and tagged accordingly.
+See `functional_specs.md` Section 2 for detailed behavior.
 - Themes can be created, edited, and deleted independently. Deleting a theme preserves its existing syntheses.
 ### 3.2 Synthesis Generation
@ -38,48 +37,25 @@ The application is designed for individuals or small teams who want an automated
 ### 3.4 Personalized Sources
 - Users add web sources (blogs, news sites) per theme.
- Sources can be imported in bulk via text input, CSV upload, or added individually.
+- Sources can be imported in bulk via text input, CSV upload, or added individually, always bound to the selected theme.
- Sources can be exported as CSV.
+- Sources can be exported as CSV, always scoped to the selected theme.
- Sources can be marked as **preferred** (prioritized during generation -- processed before non-preferred sources).
+- Sources can be marked as **preferred** (prioritized during generation -- processed before non-preferred sources), with preference state scoped per theme.
 ### 3.5 Brave Search Integration
- Optional alternative to LLM web search for Phase 2.
+Optional alternative to LLM web search for Phase 2. Users provide their own Brave Search API key; when enabled, Phase 2 queries Brave instead of using LLM web grounding. See `functional_specs.md` Section 2.5.
 - Users provide their own Brave Search API key.
 - When enabled, Phase 2 queries the Brave Search API instead of using LLM web grounding, then scrapes and classifies the results.
 ### 3.6 Export and Sharing
- **Email**: send a synthesis to any email address (or to self) via Resend.
+Syntheses can be exported as email (via Resend), PDF, or Markdown. See `functional_specs.md` Section 6.
 - **PDF**: download a synthesis as a PDF file.
 - **Markdown**: download a synthesis as a Markdown file.
 ### 3.7 Settings
-#### Per-theme settings (content)
+Settings are split into two levels: per-theme content settings (search topic, categories, max age, max items, summary length) and global pipeline settings (LLM provider/model, Brave Search, batch size, rate limits, article history retention, import/export). See `functional_specs.md` Section 4 for the complete settings reference.
 - Theme name and search topic
 - Categories (user-defined list)
 - Max age of articles (days)
 - Max items per category
 - Summary detail level (short / medium / detailed)
 #### Global settings (pipeline and AI)
 - LLM provider and model selection (research model + web search model)
 - Search agent behavior (custom instructions for the AI research prompt)
 - Brave Search toggle and API key
 - Batch size (articles processed in parallel)
 - Source extraction window (number of sources per extraction wave)
 - Max articles per source (diversity cap)
 - Max links extracted per source
 - Rate limiting (max requests / time window)
 - Article history retention (days)
 - Settings import/export (JSON)
 ### 3.8 Authentication
- Passwordless authentication via magic link emails.
+Passwordless authentication via magic link emails with Cloudflare Turnstile captcha. Sessions use 30-day HttpOnly/SameSite cookies. See `architecture.md` Section 6 for the full security model.
 - Cloudflare Turnstile captcha on login and registration.
 - 30-day session cookies (HttpOnly, SameSite).
 ## 4. User Roles
@ -95,13 +71,7 @@ The application is designed for individuals or small teams who want an automated
 ### 4.2 Admin
-All user capabilities, plus:
+All user capabilities, plus provider management (add/edit/enable/disable LLM providers and models), rate limit configuration (defaults per provider), and user management (view all users, promote/demote roles). The first admin is created via the `create-admin` CLI command. See `functional_specs.md` Section 5.
 - **Provider management**: add, edit, enable/disable, and remove LLM providers and their available models. Users select from admin-curated providers.
 - **Rate limit configuration**: set default rate limits per provider (max requests / time window). Users can override with their own values.
 - **User management**: view all users, promote users to admin or demote admins to user.
 The first admin is created via a CLI command (`create-admin`).
 ## 5. Non-Functional Requirements
@ -139,3 +109,4 @@ The first admin is created via a CLI command (`create-admin`).
 - Job store with TTL for expired generation jobs.
 - Scheduled generation with double-run prevention (`last_run_at` tracking).
 - Panic recovery and timeout handling for generation tasks.
 - Release gating in CI requires deterministic coverage for critical autonomous flows (notably scheduler execution and SSE progress behavior).
--- a/docs/technical_specs.md
+++ b/docs/technical_specs.md
@ -159,9 +159,11 @@ Per-user topic configurations with content settings.
 Indexes: `idx_themes_user_id`.
 `categories` stores user-defined categories only. Runtime/category assignment always includes `Divers` and `Sans date`.
 ### 3.6 `sources`
-User-curated news source URLs, optionally tied to a theme.
+User-curated news source URLs, always tied to a theme.
 | Column | Type | Constraints |
 |---|---|---|
@ -169,7 +171,7 @@ User-curated news source URLs, optionally tied to a theme.
 | user_id | UUID | NOT NULL, FK users(id) CASCADE |
 | title | VARCHAR(200) | NOT NULL, CHECK length 1-200 |
 | url | VARCHAR(1000) | NOT NULL, CHECK length <= 1000 |
-| theme_id | UUID | nullable, FK themes(id) CASCADE |
+| theme_id | UUID | NOT NULL, FK themes(id) CASCADE |
 | is_preferred | BOOLEAN | NOT NULL, DEFAULT false |
 | created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
@ -401,7 +403,8 @@ All endpoints are prefixed with `/api/v1`. Responses are JSON. Errors follow the
 **POST /themes**
 - Auth: Authenticated
 - Body: `{ name, theme, categories: string[], max_items_per_category?, max_age_days?, summary_length? }`
- Validation: name non-empty max 200 chars, categories 1-20 non-empty entries, max_items 1-50, max_age 1-365, summary_length 1-3.
+- Validation: name non-empty max 200 chars, categories 0-20 non-empty entries, max_items 1-50, max_age 1-365, summary_length 1-3.
 - Notes: theme creation is valid with an empty user-defined `categories` list. The system always includes `Divers` and `Sans date`.
 - Response: `ThemeResponse`
 **PUT /themes/{id}**
@ -417,7 +420,7 @@ All endpoints are prefixed with `/api/v1`. Responses are JSON. Errors follow the
 **GET /themes/{id}/schedule**
 - Auth: Authenticated (theme owner)
- Response: `ScheduleResponse` or 404
+- Response: `ScheduleResponse | null` with HTTP 200
 **PUT /themes/{id}/schedule**
 - Auth: Authenticated (theme owner)
@ -433,17 +436,19 @@ All endpoints are prefixed with `/api/v1`. Responses are JSON. Errors follow the
 **GET /sources?theme_id=...**
 - Auth: Authenticated
 - Query: `theme_id` is required
 - Response: `SourceResponse[]`
 **POST /sources**
 - Auth: Authenticated
- Body: `{ title, url, theme_id? }`
+- Body: `{ title, url, theme_id }`
 - Validation: title non-empty max 200, URL http(s) max 1000 chars.
 - Response: `SourceResponse`
 **PUT /sources/preferred**
 - Auth: Authenticated
- Body: `{ source_ids: UUID[] }`
+- Body: `{ theme_id: UUID, source_ids: UUID[] }`
 - Note: preferred state is scoped per theme.
 - Response: `{ updated: number }`
 **DELETE /sources/{id}**
@ -452,16 +457,18 @@ All endpoints are prefixed with `/api/v1`. Responses are JSON. Errors follow the
 **POST /sources/bulk**
 - Auth: Authenticated
- Body: `{ sources: CreateSourceRequest[], theme_id? }`
+- Body: `{ sources: CreateSourceRequest[], theme_id: UUID }`
 - Response: `{ imported, skipped, errors }`
 **POST /sources/import-csv**
 - Auth: Authenticated
- Body: Multipart file upload (CSV: title,url)
+- Body: Multipart file upload (CSV: title,url) + required `theme_id`
 - Response: `{ imported, skipped, errors }`
 **GET /sources/export-csv**
 - Auth: Authenticated
 - Query: `theme_id` is required
 - Scope: exports sources for the selected theme only
 - Response: CSV file download
 ### 4.6 Generation
@ -621,13 +628,13 @@ Progress is streamed to clients via a `tokio::sync::watch` channel (SSE endpoint
 1. **Load user settings** from DB (provider, models, batch_size, rate limits, etc.)
 2. **Cleanup** — delete old article history entries (>N days, dropped only) + truncate old LLM call logs
-3. **Validate** — if no categories configured, the only available category will be "Divers".
+3. **Validate** — runtime category set always includes `Divers` and `Sans date` even when no user-defined categories are configured.
 4. **Load theme** — categories, max_items_per_category, max_age_days, summary_length
 5. **Load user sources** (personalized URLs filtered by theme_id)
 6. **Resolve LLM provider** — decrypt user's API key, create provider instance (`Arc<dyn LlmProvider>`)
 7. **Resolve models** — research model + web-search model (user override or admin default)
 8. **Setup rate limiter** — per-user or global provider limiter
-9. **Initialize tracking structures** — `article_scraped` (category→articles), `source_counts` (per-domain article count), `url_source` (per-article source), `filled_counts` (per-category article count), `seen_urls` (cross-phase dedup), `classification_categories` (user categories + "Divers")
+9. **Initialize tracking structures** — `article_scraped` (category→articles), `source_counts` (per-domain article count), `url_source` (per-article source), `filled_counts` (per-category article count), `seen_urls` (cross-phase dedup), `classification_categories` (user categories + `Divers`; `Sans date` is assigned by no-date routing)
 10. **Batch trace buffer** — `pending_traces: Vec<ArticleHistoryEntry>` accumulates all article history writes; flushed with `db::article_history::batch_insert_entries` at phase boundaries.
 ### Phase 1: Personalized Sources
@ -663,7 +670,7 @@ Processing in batches of `settings.batch_size` (minimum 1). For each batch:
 - LLM returns `{title, summary, category, date, is_article}`.
 - **`is_article` check**: if false, trace as `filtered_not_article` and skip.
 - **Date fallback**: if LLM returned a date and it exceeds `max_age_days`, trace as `filtered_too_old` and skip.
- **No-date routing**: if no date found (neither scraper nor LLM), route to "Articles sans date" category.
+- **No-date routing**: if no date found (neither scraper nor LLM), route to `Sans date` category.
 - **`assign_category()`** helper: validates category, falls back to "Divers" if unknown or full. If "Divers" is also full, drops the article.
 - **LLM call logged** with full prompt/response/timing.
 - Add article to `article_scraped`, increment `filled_counts` and `source_counts`.
@ -704,7 +711,7 @@ Selected by `settings.use_brave_search`.
 ### Save + Record
 1. **Error if empty** — if all article lists are empty and generation wasn't cancelled, return error.
-2. **Order sections** — user-defined categories first (in order), then "Divers" if non-empty, then "Articles sans date" if non-empty.
+2. **Order sections** — user-defined categories first (in order), then `Divers` if non-empty, then `Sans date` if non-empty.
 3. **Sanitize** — strip `\u0000` null bytes from JSON (PostgreSQL JSONB requirement).
 4. **Save synthesis** — insert into `syntheses` table with `job_id`, `week` (ISO week), `sections` (JSONB), `status: completed`, `theme_id`.
 5. **Record used articles** — for each article in the final synthesis, build trace with `status: "used"`, `synthesis_id`, and correct `source_type` (inferred from `url_source`). Batch-insert into `article_history`.
@ -754,7 +761,7 @@ All calls use structured JSON output (response_schema defines the expected shape
 `llm/schema.rs` builds JSON Schema definitions for:
 - Classification/summarization: `{title, summary, category, is_article}`
 - Web search: `{category_0: [{title, url, summary}], ...}` with per-category arrays
- Source link extraction: `{links: [{url}]}`
+- Source link extraction schema is deprecated (link extraction mode is no longer configurable).
 ### Error Mapping