From f07e91ba116849b1f69b0950513c707fac3d0326 Mon Sep 17 00:00:00 2001
From: oabrivard <olivier@abrivard.fr>
Date: Fri, 27 Mar 2026 15:00:49 +0100
Subject: [PATCH] docs: add consolidated architecture.md and technical_specs.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 docs/architecture.md    | 382 +++++++++++++++++++
 docs/technical_specs.md | 793 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 1175 insertions(+)
 create mode 100644 docs/architecture.md
 create mode 100644 docs/technical_specs.md

diff --git a/docs/architecture.md b/docs/architecture.md
new file mode 100644
index 0000000..ae2a428
--- /dev/null
+++ b/docs/architecture.md
@@ -0,0 +1,382 @@
+# AI Weekly Synth -- Architecture Document
+
+## 1. System Overview
+
+AI Weekly Synth is a self-hosted web application that generates AI-powered weekly news syntheses. Users configure topics (themes), categories, and an LLM provider; the system then searches the web, scrapes and validates sources, classifies articles, and produces structured summaries.
+
+### Technology Stack
+
+| Layer | Technology |
+|---|---|
+| Backend | Rust (Axum 0.8) |
+| Frontend | SolidJS 1.9 + Tailwind CSS v4 |
+| Database | PostgreSQL 17 (via sqlx with compile-time query checking) |
+| Deployment | Docker Compose (app + Postgres) |
+
+### Deployment Topology
+
+```
+docker-compose.yml
+  ├── app  (ai-synth)       port 8080
+  │     ├── Axum HTTP server
+  │     ├── Static file serving (SPA fallback)
+  │     └── Background tasks (scheduler, session cleanup, job TTL)
+  └── db   (postgres:17-alpine)  port 5432 (localhost only)
+        └── postgres_data volume
+```
+
+The app container builds from a multi-stage Dockerfile, serves the SolidJS frontend as static files, and connects to Postgres over the `internal` bridge network.
+
+---
+
+## 2. Layer Architecture
+
+The backend follows a three-layer architecture with shared model types:
+
+```
+handlers/  (HTTP layer)
+    │
+    ├── extracts request data (Axum extractors, JSON, path params)
+    ├── validates input
+    ├── calls services/ or db/ directly
+    └── formats HTTP responses
+    │
+services/  (Business logic)
+    │
+    ├── synthesis pipeline orchestration
+    ├── LLM provider abstraction + factory
+    ├── scraping (articles, source pages)
+    ├── encryption, email, CSV, PDF export
+    ├── rate limiting, job store, scheduler
+    └── Brave Search client
+    │
+db/  (Data access)
+    │
+    ├── pure SQL queries via sqlx
+    ├── typed result mapping (FromRow)
+    └── no business logic
+    │
+models/  (Shared types -- used by all layers)
+    │
+    ├── domain structs (User, Theme, Source, Synthesis, etc.)
+    ├── request/response DTOs
+    └── validation logic
+```
+
+### Module Inventory
+
+**Handlers** (`handlers/`): `admin`, `api_keys`, `article_history`, `auth`, `config`, `generation`, `health`, `llm_logs`, `schedules`, `settings`, `sources`, `syntheses`, `themes`
+
+**Services** (`services/`): `auth`, `brave_search`, `csv`, `email`, `encryption`, `export`, `job_store`, `llm` (with `gemini`, `openai`, `anthropic`, `mock`, `factory`, `schema`), `prompts`, `rate_limiter`, `scheduler`, `scraper`, `source_scraper`, `synthesis`, `turnstile`
+
+**DB** (`db/`): `api_keys`, `article_history`, `audit`, `llm_call_log`, `magic_links`, `providers`, `rate_limits`, `schedules`, `sessions`, `settings`, `sources`, `syntheses`, `themes`, `users`
+
+**Models** (`models/`): `api_key`, `audit`, `magic_link`, `provider`, `rate_limit`, `schedule`, `session`, `settings`, `source`, `synthesis`, `theme`, `user`
+
+---
+
+## 3. Key Components
+
+### 3.1 LLM Provider Abstraction
+
+The `LlmProvider` trait defines a unified interface for all LLM backends:
+
+```rust
+#[async_trait]
+pub trait LlmProvider: Send + Sync {
+    fn provider_id(&self) -> &str;
+    async fn call_llm(&self, model: &str, system_prompt: &str,
+                       user_prompt: &str, response_schema: &Value)
+        -> Result<Value, AppError>;
+}
+```
+
+Implementations: `GeminiProvider`, `OpenAiProvider`, `AnthropicProvider`, `MockLlmProvider`.
+
+The factory (`llm/factory.rs`) creates provider instances by name. The mock provider enables end-to-end pipeline testing without real API calls.
+
+### 3.2 Synthesis Pipeline
+
+The pipeline is the core business logic, orchestrated in `services/synthesis.rs`. It runs as a background tokio task with a 15-minute timeout.
+
+**Three phases:**
+
+1. **Phase 1 -- Personalized Sources**: Extract article links from user-curated source pages (windowed, rolling), scrape articles, classify and summarize each via LLM. Batched processing with configurable `batch_size`.
+
+2. **Phase 2 -- Web Search Fallback**: For under-filled categories, either call the Brave Search API or use the LLM's web search capability to find additional articles. Scrape and validate results.
+
+3. **Save**: Assemble sections by category, sanitize JSON, persist to database, record article history traces.
+
+Progress is reported via `tokio::sync::watch` channels consumed by SSE endpoints.
+
+### 3.3 Job Store
+
+`JobStore` (`services/job_store.rs`) is an in-memory concurrent store for active generation jobs:
+
+- Backed by `DashMap<Uuid, JobEntry>` for lock-free access
+- `DashSet<Uuid>` for per-user deduplication (one active job per user)
+- Each job holds a `watch::Sender<ProgressEvent>` for real-time SSE streaming
+- `AtomicBool` for cooperative cancellation
+- 1-hour TTL with automatic cleanup
+
+### 3.4 Scheduler
+
+`services/scheduler.rs` runs as a background task, checking every minute for due `theme_schedules`. When a schedule fires:
+
+1. Query `find_due_schedules` matching current day code + time
+2. Skip if user already has a manual generation in progress
+3. Run `synthesis::run_generation_inner` directly
+4. Send email to configured recipients (up to 3)
+5. Mark schedule as run
+
+### 3.5 Scraper
+
+Two scraping services:
+
+- **`scraper.rs`**: Article page scraper with SSRF prevention, HTML parsing, title/date/body extraction, soft-404 detection, 15s timeout, 5MB body limit.
+- **`source_scraper.rs`**: Source index page scraper that extracts article links from user-configured source URLs (HTML `<a>` parsing with filters, or LLM-assisted extraction).
+
+### 3.6 Rate Limiters
+
+- **Auth rate limiter**: 10 requests/60s per key (email or IP) for magic link endpoints.
+- **Provider rate limiter**: Per-LLM-provider sliding window, admin-configured, hot-reloaded from DB.
+- **User rate limiters**: Per-user generation rate limits cached in `DashMap`, recreated on settings change.
+
+---
+
+## 4. Data Model
+
+### Tables and Relationships
+
+```
+users
+  ├── sessions          (user_id FK, CASCADE)
+  ├── magic_tokens      (email reference, no FK)
+  ├── settings          (user_id PK/FK, CASCADE)
+  ├── themes            (user_id FK, CASCADE)
+  │     ├── sources           (theme_id FK, CASCADE)
+  │     ├── syntheses         (theme_id FK, SET NULL)
+  │     └── theme_schedules   (theme_id FK, CASCADE, UNIQUE)
+  ├── user_api_keys     (user_id FK, CASCADE; UNIQUE per provider)
+  ├── article_history   (user_id FK, CASCADE)
+  ├── llm_call_log      (user_id FK, CASCADE)
+  └── audit_log         (admin_user_id FK, SET NULL)
+
+admin_providers
+  └── admin_rate_limits (provider_name FK, CASCADE)
+```
+
+### Table Summary
+
+| Table | Purpose | Key Columns |
+|---|---|---|
+| `users` | User accounts | id, email, display_name, role (user/admin), created_at |
+| `sessions` | Login sessions | session_hash (PK), user_id, expires_at, last_active_at, ip_address |
+| `magic_tokens` | Passwordless auth tokens | id, email, token_hash, expires_at, used |
+| `settings` | Per-user pipeline config | user_id (PK), ai_provider, ai_model, ai_model_websearch, batch_size, max_articles_per_source, max_links_per_source, use_brave_search, source_extraction_window, article_history_days, search_agent_behavior, rate_limit_max_requests, rate_limit_time_window_seconds |
+| `themes` | Per-user topic configurations | id, user_id, name, theme, categories (JSONB), max_items_per_category, max_age_days, summary_length |
+| `sources` | User-curated news source URLs | id, user_id, title, url, theme_id, is_preferred |
+| `syntheses` | Generated synthesis results | id, user_id, week, sections (JSONB), status, job_id, theme_id |
+| `theme_schedules` | Automated generation schedules | id, theme_id (UNIQUE), user_id, enabled, days (JSONB), time_utc, emails (JSONB), last_run_at |
+| `article_history` | Article URL dedup + provenance trace | id, user_id, url, url_hash, title, source_type, source_url, category, synthesis_id, status, scraped_ok, job_id, published_date |
+| `llm_call_log` | Full LLM interaction log | id, user_id, job_id, call_type, model, system_prompt, user_prompt, response_body, duration_ms, article_url |
+| `admin_providers` | Admin-curated LLM provider catalog | id, provider_name (UNIQUE), display_name, models_scraping (JSONB), models_websearch (JSONB), is_enabled |
+| `admin_rate_limits` | Per-provider rate limit config | id, provider_name (UNIQUE, FK), max_requests, time_window_seconds |
+| `user_api_keys` | Encrypted user LLM API keys | id, user_id, provider_name, encrypted_key (BYTEA), nonce (BYTEA), key_prefix; UNIQUE(user_id, provider_name) |
+| `audit_log` | Admin mutation audit trail | id, admin_user_id, action, target_type, target_id, details (JSONB) |
+
+---
+
+## 5. API Overview
+
+All API routes are prefixed with `/api/v1`. CSRF protection (`X-Requested-With` header) is applied to all mutating endpoints.
+
+### Authentication
+
+| Method | Path | Auth | Description |
+|---|---|---|---|
+| POST | /auth/register | Public | Create account + send magic link |
+| POST | /auth/login | Public | Request magic link |
+| GET | /auth/verify | Public | Verify token (email click redirect) |
+| POST | /auth/verify | Public | Verify token (frontend API call) |
+| POST | /auth/logout | Authenticated | Destroy session |
+| GET | /auth/me | Authenticated | Current user info |
+
+### Settings
+
+| Method | Path | Auth | Description |
+|---|---|---|---|
+| GET | /settings | Authenticated | Get user settings |
+| PUT | /settings | Authenticated | Update user settings |
+
+### Themes
+
+| Method | Path | Auth | Description |
+|---|---|---|---|
+| GET | /themes | Authenticated | List user themes |
+| POST | /themes | Authenticated | Create theme |
+| PUT | /themes/{id} | Authenticated | Update theme |
+| DELETE | /themes/{id} | Authenticated | Delete theme |
+
+### Schedules
+
+| Method | Path | Auth | Description |
+|---|---|---|---|
+| GET | /themes/{id}/schedule | Authenticated | Get theme schedule |
+| PUT | /themes/{id}/schedule | Authenticated | Create or update schedule |
+| DELETE | /themes/{id}/schedule | Authenticated | Delete schedule |
+
+### Sources
+
+| Method | Path | Auth | Description |
+|---|---|---|---|
+| GET | /sources | Authenticated | List sources |
+| POST | /sources | Authenticated | Create source |
+| PUT | /sources/preferred | Authenticated | Update preferred sources |
+| DELETE | /sources/{id} | Authenticated | Delete source |
+| POST | /sources/bulk | Authenticated | Bulk import (JSON) |
+| POST | /sources/import-csv | Authenticated | Import from CSV |
+| GET | /sources/export-csv | Authenticated | Export as CSV |
+
+### Syntheses & Generation
+
+| Method | Path | Auth | Description |
+|---|---|---|---|
+| GET | /syntheses | Authenticated | List syntheses |
+| GET | /syntheses/{id} | Authenticated | Get full synthesis |
+| DELETE | /syntheses/{id} | Authenticated | Delete synthesis |
+| POST | /syntheses/generate | Authenticated | Trigger generation |
+| GET | /syntheses/generate/{job_id}/progress | Authenticated | SSE progress stream |
+| POST | /syntheses/generate/{job_id}/stop | Authenticated | Cancel generation |
+| POST | /syntheses/{id}/send-email | Authenticated | Email synthesis |
+| GET | /syntheses/{id}/export/markdown | Authenticated | Markdown download |
+| GET | /syntheses/{id}/export/pdf | Authenticated | PDF download |
+
+### Article History & LLM Logs
+
+| Method | Path | Auth | Description |
+|---|---|---|---|
+| GET | /article-history | Authenticated | List article history |
+| DELETE | /article-history | Authenticated | Clear article history |
+| GET | /syntheses/{id}/provenance | Authenticated | Get synthesis provenance |
+| GET | /llm-logs/{job_id} | Authenticated | Get LLM call logs for job |
+
+### User API Keys
+
+| Method | Path | Auth | Description |
+|---|---|---|---|
+| GET | /user/api-keys | Authenticated | List keys (prefix only) |
+| POST | /user/api-keys | Authenticated | Store encrypted key |
+| DELETE | /user/api-keys/{provider} | Authenticated | Delete key |
+| POST | /user/api-keys/{provider}/test | Authenticated | Test key validity |
+| POST | /user/api-keys/export | Authenticated | Export keys |
+
+### Configuration & Admin
+
+| Method | Path | Auth | Description |
+|---|---|---|---|
+| GET | /config/providers | Authenticated | Available providers/models |
+| GET | /admin/providers | Admin | List all providers |
+| POST | /admin/providers | Admin | Create provider |
+| PUT | /admin/providers/{id} | Admin | Update provider |
+| DELETE | /admin/providers/{id} | Admin | Delete provider |
+| GET | /admin/rate-limits | Admin | List rate limits |
+| PUT | /admin/rate-limits/{provider_name} | Admin | Update rate limit |
+| GET | /admin/users | Admin | List users |
+| PUT | /admin/users/{id}/role | Admin | Change user role |
+
+### Infrastructure
+
+| Method | Path | Auth | Description |
+|---|---|---|---|
+| GET | /health | Public | Health check |
+
+---
+
+## 6. Security Architecture
+
+### Authentication & Session Management
+
+- **Passwordless**: Magic link tokens sent via email (Resend API), single-use, time-limited
+- **Captcha**: Cloudflare Turnstile on registration and login
+- **Sessions**: SHA-256 hashed tokens stored in DB, 30-day expiry, `HttpOnly` + `SameSite=Lax` cookies, optionally `Secure`
+- **Anti-enumeration**: Same response for existent/non-existent emails, timing attack mitigation
+- **Authorization**: `AuthUser` and `AdminUser` Axum extractors enforce auth levels per handler
+
+### CSRF Protection
+
+All mutating API endpoints require the `X-Requested-With` header (checked by `csrf::csrf_check` middleware layer). Non-mutating GET/HEAD/OPTIONS requests are exempt.
+
+### Encryption at Rest
+
+User LLM API keys are encrypted with AES-256-GCM before storage:
+- 32-byte master key from `MASTER_ENCRYPTION_KEY` env var (64 hex chars)
+- Random 12-byte nonce per encryption (stored alongside ciphertext)
+- Key bytes are zeroized on drop (`zeroize` crate)
+- Only a key prefix (first 8 chars + "...") is ever returned via the API
+
+### SSRF Prevention
+
+Both `scraper.rs` and `source_scraper.rs` validate URLs before fetching:
+- DNS resolution check against private/loopback IP ranges
+- Redirect chain validation (no redirects to private IPs)
+- Only HTTP/HTTPS schemes allowed
+
+### Security Headers
+
+Applied as global middleware layers:
+- `Content-Security-Policy` (self + Cloudflare Turnstile)
+- `X-Content-Type-Options: nosniff`
+- `X-Frame-Options: DENY`
+- `Referrer-Policy: strict-origin-when-cross-origin`
+- `X-XSS-Protection: 1; mode=block`
+- `Strict-Transport-Security` (HTTPS only)
+
+### Error Sanitization
+
+The `sanitize_error_message` function strips API keys and internal details from error messages before they reach SSE clients. Internal errors log full details server-side but return generic messages to users.
+
+### CORS
+
+Configured to allow only the `APP_URL` origin, with credentials (cookies), limited to GET/POST/PUT/DELETE methods.
+
+---
+
+## 7. Concurrency Model
+
+### Async Runtime
+
+Tokio with full features. The Axum server runs as a multi-threaded async runtime.
+
+### Background Tasks
+
+Spawned at startup via `tokio::spawn`:
+- **Session cleanup**: Hourly deletion of expired DB sessions
+- **Job store cleanup**: Periodic removal of expired job entries (1-hour TTL)
+- **Scheduler**: Minute-by-minute check for due theme schedules
+
+### Generation Pipeline Concurrency
+
+- **`tokio::task::JoinSet`**: Used for parallel scraping (bounded concurrency of 5 for source extraction) and parallel LLM classification calls within each batch
+- **`tokio::sync::watch`**: Fan-out progress notifications to SSE clients; late subscribers immediately receive the latest state
+- **`AtomicBool`**: Cooperative cancellation flag checked between pipeline stages; avoids mutex overhead
+- **`DashMap` / `DashSet`**: Lock-free concurrent access for the job store (job entries), generating-users set, per-user rate limiter cache, and provider rate limiter state
+
+### Task Lifecycle
+
+```
+POST /generate
+  └── handler creates job in JobStore
+        └── spawns outer task (panic monitor)
+              └── spawns inner task (15-min timeout)
+                    └── run_generation_inner()
+                          ├── Phase 1 (JoinSet scrape, JoinSet classify)
+                          ├── Phase 2 (JoinSet scrape, JoinSet classify)
+                          └── Save to DB
+              └── on complete/error: send final ProgressEvent
+                    └── delayed cleanup (5 min) then remove from JobStore
+```
+
+### Graceful Shutdown
+
+The server supports graceful shutdown via signal handling, allowing in-flight requests to complete.
diff --git a/docs/technical_specs.md b/docs/technical_specs.md
new file mode 100644
index 0000000..c4ee605
--- /dev/null
+++ b/docs/technical_specs.md
@@ -0,0 +1,793 @@
+# AI Weekly Synth -- Technical Specifications
+
+## 1. Backend Tech Stack
+
+| Dependency | Version | Purpose |
+|---|---|---|
+| axum | 0.8 | Web framework (macros, multipart) |
+| tokio | 1 | Async runtime (full features) |
+| tower | 0.5 | Middleware composition |
+| tower-http | 0.6 | CORS, static files, tracing, headers |
+| sqlx | 0.8 | Async Postgres driver (runtime-tokio, tls-rustls, uuid, chrono, json, migrate) |
+| reqwest | 0.12 | HTTP client (JSON) |
+| serde / serde_json | 1 | Serialization/deserialization |
+| chrono | 0.4 | Date/time handling (serde feature) |
+| aes-gcm | 0.10 | AES-256-GCM encryption |
+| zeroize | 1 | Secure memory zeroing |
+| sha2 | 0.10 | SHA-256 hashing |
+| rand | 0.8 | Random number generation |
+| base64 | 0.22 | Base64 encoding |
+| hex | 0.4 | Hex encoding/decoding |
+| async-trait | 0.1 | Async trait objects |
+| tracing / tracing-subscriber | 0.1 / 0.3 | Structured logging (env-filter, json) |
+| dotenvy | 0.15 | .env file loading |
+| clap | 4 | CLI argument parsing |
+| scraper | 0.22 | HTML parsing (CSS selectors) |
+| ego-tree | 0.10 | Tree data structure (used by scraper) |
+| url | 2 | URL parsing and validation |
+| email_address | 0.2 | Email validation |
+| anyhow | 1 | Error context |
+| thiserror | 2 | Error type derivation |
+| uuid | 1 | UUID v4 generation (serde feature) |
+| dashmap | 6 | Concurrent hash maps |
+| tokio-stream | 0.1 | Stream utilities for SSE |
+| futures | 0.3 | Async stream combinators |
+| printpdf | 0.7 | PDF generation |
+
+**Dev dependencies**: tower (util), http-body-util, wiremock 0.6.
+
+**Rust edition**: 2021.
+
+---
+
+## 2. Frontend Tech Stack
+
+| Dependency | Version | Purpose |
+|---|---|---|
+| solid-js | ^1.9.0 | Reactive UI framework |
+| @solidjs/router | ^0.15.0 | Client-side routing |
+| lucide-solid | ^0.475.0 | Icon library |
+| date-fns | ^4.1.0 | Date formatting |
+| tailwindcss | ^4.1.0 | Utility-first CSS (v4) |
+| @tailwindcss/vite | ^4.1.0 | Tailwind Vite plugin |
+| vite | ^6.2.0 | Build tool and dev server |
+| vite-plugin-solid | ^2.11.0 | SolidJS Vite integration |
+| typescript | ~5.8.0 | Type checking |
+| vitest | ^3.0.0 | Unit testing |
+| @solidjs/testing-library | ^0.8.0 | Component testing |
+| jsdom | ^25.0.0 | DOM environment for tests |
+
+### Frontend Routes
+
+| Path | Component | Auth | Description |
+|---|---|---|---|
+| /login | Login | Public | Login page |
+| /register | Register | Public | Registration page |
+| /auth/verify | AuthVerify | Public | Magic link verification |
+| / | Home | Protected | Dashboard / synthesis list |
+| /settings | Settings | Protected | User settings |
+| /themes | ThemeManager | Protected | Theme CRUD + source management |
+| /generate | GenerateSynthesis | Protected | Generation trigger + progress |
+| /synthesis/:id | SynthesisDetail | Protected | Full synthesis view |
+| /article-history | ArticleHistory | Protected | Article history browser |
+| /llm-logs/:jobId | LlmLogs | Protected | LLM call log viewer |
+| /admin/providers | AdminProviders | Admin | Provider configuration |
+| /admin/rate-limits | AdminRateLimits | Admin | Rate limit configuration |
+| /admin/users | AdminUsers | Admin | User management |
+
+---
+
+## 3. Database Schema
+
+### 3.1 `users`
+
+| Column | Type | Constraints |
+|---|---|---|
+| id | UUID | PK, DEFAULT gen_random_uuid() |
+| email | TEXT | NOT NULL, UNIQUE |
+| display_name | TEXT | nullable |
+| role | TEXT | NOT NULL, DEFAULT 'user', CHECK (user/admin) |
+| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
+| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
+
+Indexes: `idx_users_email` on (email).
+
+### 3.2 `sessions`
+
+| Column | Type | Constraints |
+|---|---|---|
+| session_hash | TEXT | PK (SHA-256 of raw token) |
+| user_id | UUID | NOT NULL, FK users(id) CASCADE |
+| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
+| expires_at | TIMESTAMPTZ | NOT NULL |
+| last_active_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
+| ip_address | TEXT | nullable |
+| user_agent | TEXT | nullable |
+
+Indexes: `idx_sessions_user_id`, `idx_sessions_expires_at`.
+
+### 3.3 `magic_tokens`
+
+| Column | Type | Constraints |
+|---|---|---|
+| id | UUID | PK, DEFAULT gen_random_uuid() |
+| email | TEXT | NOT NULL |
+| token_hash | TEXT | NOT NULL, UNIQUE |
+| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
+| expires_at | TIMESTAMPTZ | NOT NULL |
+| used | BOOLEAN | NOT NULL, DEFAULT false |
+
+Indexes: `idx_magic_tokens_email`, `idx_magic_tokens_expires`.
+
+### 3.4 `settings`
+
+Per-user pipeline configuration. One row per user (user_id is the PK).
+
+| Column | Type | Constraints |
+|---|---|---|
+| user_id | UUID | PK, FK users(id) CASCADE |
+| max_articles_per_source | INTEGER | NOT NULL, DEFAULT 3 |
+| max_links_per_source | INTEGER | NOT NULL, DEFAULT 8 |
+| use_brave_search | BOOLEAN | NOT NULL, DEFAULT false |
+| article_history_days | INTEGER | NOT NULL, DEFAULT 90 |
+| batch_size | INTEGER | NOT NULL, DEFAULT 5 |
+| source_extraction_window | INTEGER | NOT NULL, DEFAULT 3 |
+| search_agent_behavior | TEXT | NOT NULL, DEFAULT '' |
+| ai_provider | TEXT | NOT NULL, DEFAULT '' |
+| ai_model | TEXT | NOT NULL, DEFAULT '' |
+| ai_model_websearch | TEXT | NOT NULL, DEFAULT '' |
+| rate_limit_max_requests | INTEGER | nullable |
+| rate_limit_time_window_seconds | INTEGER | nullable |
+| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
+
+### 3.5 `themes`
+
+Per-user topic configurations with content settings.
+
+| Column | Type | Constraints |
+|---|---|---|
+| id | UUID | PK, DEFAULT gen_random_uuid() |
+| user_id | UUID | NOT NULL, FK users(id) CASCADE |
+| name | TEXT | NOT NULL |
+| theme | TEXT | NOT NULL (search topic) |
+| categories | JSONB | NOT NULL, DEFAULT '[]' |
+| max_items_per_category | INTEGER | NOT NULL, DEFAULT 4 |
+| max_age_days | INTEGER | NOT NULL, DEFAULT 7 |
+| summary_length | INTEGER | NOT NULL, DEFAULT 3 |
+| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
+| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
+
+Indexes: `idx_themes_user_id`.
+
+### 3.6 `sources`
+
+User-curated news source URLs, optionally tied to a theme.
+
+| Column | Type | Constraints |
+|---|---|---|
+| id | UUID | PK, DEFAULT gen_random_uuid() |
+| user_id | UUID | NOT NULL, FK users(id) CASCADE |
+| title | VARCHAR(200) | NOT NULL, CHECK length 1-200 |
+| url | VARCHAR(1000) | NOT NULL, CHECK length <= 1000 |
+| theme_id | UUID | nullable, FK themes(id) CASCADE |
+| is_preferred | BOOLEAN | NOT NULL, DEFAULT false |
+| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
+
+Indexes: `idx_sources_user_id`, UNIQUE `idx_sources_user_id_url` on (user_id, url).
+
+### 3.7 `syntheses`
+
+Generated synthesis results with JSONB section data.
+
+| Column | Type | Constraints |
+|---|---|---|
+| id | UUID | PK, DEFAULT gen_random_uuid() |
+| user_id | UUID | NOT NULL, FK users(id) CASCADE |
+| week | VARCHAR(10) | NOT NULL (ISO week string) |
+| sections | JSONB | NOT NULL, DEFAULT '[]' |
+| status | VARCHAR(20) | NOT NULL, DEFAULT 'completed' |
+| job_id | UUID | nullable |
+| theme_id | UUID | nullable, FK themes(id) SET NULL |
+| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
+
+Indexes: `idx_syntheses_user_id_created_at` on (user_id, created_at DESC).
+
+JSONB structure for `sections`:
+```json
+[
+  {
+    "title": "Category Name",
+    "items": [
+      { "title": "Article Title", "url": "https://...", "summary": "...", "date": "2026-03-25" }
+    ]
+  }
+]
+```
+
+### 3.8 `theme_schedules`
+
+Automated generation schedules, one per theme.
+
+| Column | Type | Constraints |
+|---|---|---|
+| id | UUID | PK, DEFAULT gen_random_uuid() |
+| theme_id | UUID | NOT NULL, UNIQUE, FK themes(id) CASCADE |
+| user_id | UUID | NOT NULL, FK users(id) CASCADE |
+| enabled | BOOLEAN | NOT NULL, DEFAULT true |
+| days | JSONB | NOT NULL, DEFAULT '[]' (e.g. ["mon","fri"]) |
+| time_utc | TEXT | NOT NULL, DEFAULT '08:00' (HH:MM) |
+| emails | JSONB | NOT NULL, DEFAULT '[]' (up to 3 addresses) |
+| last_run_at | TIMESTAMPTZ | nullable |
+| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
+| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
+
+Indexes: `idx_theme_schedules_enabled` (partial, WHERE enabled = true).
+
+### 3.9 `article_history`
+
+Article URL deduplication and full provenance tracing.
+
+| Column | Type | Constraints |
+|---|---|---|
+| id | UUID | PK, DEFAULT gen_random_uuid() |
+| user_id | UUID | NOT NULL, FK users(id) CASCADE |
+| url_hash | TEXT | NOT NULL (SHA-256 of normalized URL) |
+| url | TEXT | NOT NULL |
+| title | TEXT | NOT NULL, DEFAULT '' |
+| source_type | TEXT | NOT NULL, DEFAULT 'unknown' |
+| source_url | TEXT | nullable |
+| category | TEXT | nullable |
+| synthesis_id | UUID | nullable, FK syntheses(id) SET NULL |
+| status | TEXT | NOT NULL, DEFAULT 'used' |
+| scraped_ok | BOOLEAN | NOT NULL, DEFAULT true |
+| job_id | UUID | NOT NULL |
+| published_date | TEXT | nullable |
+| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
+
+Indexes: `idx_article_history_user_url` on (user_id, url_hash), `idx_article_history_job_id`.
+
+Status values: `used`, `filtered_history`, `filtered_diversity`, `filtered_not_article`, `filtered_too_old`, `filtered_empty`, `filtered_homepage`, `filtered_cross_phase_dedup`.
+
+Source type values: `personalized_source`, `brave_search`, `web_search`.
+
+### 3.10 `llm_call_log`
+
+Full LLM interaction logging for debugging and analysis.
+
+| Column | Type | Constraints |
+|---|---|---|
+| id | UUID | PK, DEFAULT gen_random_uuid() |
+| user_id | UUID | NOT NULL, FK users(id) CASCADE |
+| job_id | UUID | NOT NULL |
+| call_type | TEXT | NOT NULL |
+| model | TEXT | NOT NULL |
+| system_prompt | TEXT | NOT NULL, DEFAULT '' |
+| user_prompt | TEXT | NOT NULL, DEFAULT '' |
+| response_body | TEXT | NOT NULL, DEFAULT '' |
+| duration_ms | INTEGER | NOT NULL, DEFAULT 0 |
+| article_url | TEXT | nullable |
+| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
+
+Indexes: `idx_llm_call_log_job_id`, `idx_llm_call_log_user_id` on (user_id, created_at).
+
+### 3.11 `admin_providers`
+
+Admin-curated catalog of LLM providers and their models.
+
+| Column | Type | Constraints |
+|---|---|---|
+| id | UUID | PK, DEFAULT gen_random_uuid() |
+| provider_name | VARCHAR(50) | NOT NULL, UNIQUE |
+| display_name | VARCHAR(100) | NOT NULL |
+| models_scraping | JSONB | NOT NULL, DEFAULT '[]' |
+| models_websearch | JSONB | NOT NULL, DEFAULT '[]' |
+| is_enabled | BOOLEAN | NOT NULL, DEFAULT true |
+| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
+| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
+
+Indexes: `idx_admin_providers_enabled` (partial, WHERE is_enabled = true).
+
+Seeded with: gemini, openai, anthropic.
+
+JSONB model structure:
+```json
+[{"model_id": "gemini-2.5-pro", "display_name": "Gemini 2.5 Pro", "is_default": true}]
+```
+
+### 3.12 `admin_rate_limits`
+
+Per-provider rate limit configuration.
+
+| Column | Type | Constraints |
+|---|---|---|
+| id | UUID | PK, DEFAULT gen_random_uuid() |
+| provider_name | VARCHAR(50) | NOT NULL, UNIQUE, FK admin_providers(provider_name) CASCADE |
+| max_requests | INTEGER | NOT NULL, DEFAULT 30 |
+| time_window_seconds | INTEGER | NOT NULL, DEFAULT 60 |
+| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
+
+Seeded defaults: gemini 29/60s, openai 50/60s, anthropic 40/60s.
+
+### 3.13 `user_api_keys`
+
+Encrypted user LLM API keys.
+
+| Column | Type | Constraints |
+|---|---|---|
+| id | UUID | PK, DEFAULT gen_random_uuid() |
+| user_id | UUID | NOT NULL, FK users(id) CASCADE |
+| provider_name | VARCHAR(50) | NOT NULL |
+| encrypted_key | BYTEA | NOT NULL |
+| nonce | BYTEA | NOT NULL |
+| key_prefix | VARCHAR(20) | NOT NULL |
+| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
+| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
+
+Constraint: UNIQUE(user_id, provider_name). Valid providers: gemini, openai, anthropic, brave_search.
+
+### 3.14 `audit_log`
+
+Admin mutation audit trail.
+
+| Column | Type | Constraints |
+|---|---|---|
+| id | UUID | PK, DEFAULT gen_random_uuid() |
+| admin_user_id | UUID | nullable, FK users(id) SET NULL |
+| action | VARCHAR(100) | NOT NULL |
+| target_type | VARCHAR(50) | nullable |
+| target_id | VARCHAR(255) | nullable |
+| details | JSONB | nullable |
+| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
+
+Indexes: `idx_audit_log_created_at` (DESC), `idx_audit_log_admin_user`.
+
+---
+
+## 4. API Endpoints
+
+All endpoints are prefixed with `/api/v1`. Responses are JSON. Errors follow the shape `{ "error": "message" }`.
+
+### 4.1 Authentication
+
+**POST /auth/register**
+- Auth: Public
+- Body: `{ email: string, display_name?: string, turnstile_token: string }`
+- Response: `{ message: string }`
+- Sends magic link email. Rate limited.
+
+**POST /auth/login**
+- Auth: Public
+- Body: `{ email: string, turnstile_token: string }`
+- Response: `{ message: string }`
+- Sends magic link email. Rate limited.
+
+**GET /auth/verify?token=...&email=...**
+- Auth: Public
+- Response: Redirect to frontend with session cookie set.
+
+**POST /auth/verify**
+- Auth: Public
+- Body: `{ token: string, email: string }`
+- Response: `{ message: string, user: User }`
+- Sets `session` HttpOnly cookie (30-day expiry).
+
+**POST /auth/logout**
+- Auth: Authenticated
+- Response: `{ message: string }`
+- Clears session cookie and deletes DB session.
+
+**GET /auth/me**
+- Auth: Authenticated
+- Response: `{ id, email, display_name, role, created_at }`
+
+### 4.2 Settings
+
+**GET /settings**
+- Auth: Authenticated
+- Response: `UserSettings` (creates defaults if not exists)
+
+**PUT /settings**
+- Auth: Authenticated
+- Body: `UpdateSettingsRequest` (all fields required)
+- Validation: max_articles_per_source 1-10, max_links_per_source 1-30, batch_size 1-20, source_extraction_window 1-10, article_history_days 0-365, search_agent_behavior max 2000 chars, ai_provider/ai_model/ai_model_websearch max 100 chars.
+- Response: Updated `UserSettings`
+
+### 4.3 Themes
+
+**GET /themes**
+- Auth: Authenticated
+- Response: `ThemeResponse[]`
+
+**POST /themes**
+- Auth: Authenticated
+- Body: `{ name, theme, categories: string[], max_items_per_category?, max_age_days?, summary_length? }`
+- Validation: name non-empty max 200 chars, categories 1-20 non-empty entries, max_items 1-50, max_age 1-365, summary_length 1-3.
+- Response: `ThemeResponse`
+
+**PUT /themes/{id}**
+- Auth: Authenticated (owner only)
+- Body: `UpdateThemeRequest` (all fields optional)
+- Response: `ThemeResponse`
+
+**DELETE /themes/{id}**
+- Auth: Authenticated (owner only)
+- Response: 204 No Content
+
+### 4.4 Schedules
+
+**GET /themes/{id}/schedule**
+- Auth: Authenticated (theme owner)
+- Response: `ScheduleResponse` or 404
+
+**PUT /themes/{id}/schedule**
+- Auth: Authenticated (theme owner)
+- Body: `{ enabled, days: string[], time_utc: "HH:MM", emails: string[] }`
+- Validation: days from mon-sun, time HH:MM format, max 3 emails.
+- Response: `ScheduleResponse`
+
+**DELETE /themes/{id}/schedule**
+- Auth: Authenticated (theme owner)
+- Response: 204 No Content
+
+### 4.5 Sources
+
+**GET /sources?theme_id=...**
+- Auth: Authenticated
+- Response: `SourceResponse[]`
+
+**POST /sources**
+- Auth: Authenticated
+- Body: `{ title, url, theme_id? }`
+- Validation: title non-empty max 200, URL http(s) max 1000 chars.
+- Response: `SourceResponse`
+
+**PUT /sources/preferred**
+- Auth: Authenticated
+- Body: `{ source_ids: UUID[] }`
+- Response: `{ updated: number }`
+
+**DELETE /sources/{id}**
+- Auth: Authenticated (owner only)
+- Response: 204 No Content
+
+**POST /sources/bulk**
+- Auth: Authenticated
+- Body: `{ sources: CreateSourceRequest[], theme_id? }`
+- Response: `{ imported, skipped, errors }`
+
+**POST /sources/import-csv**
+- Auth: Authenticated
+- Body: Multipart file upload (CSV: title,url)
+- Response: `{ imported, skipped, errors }`
+
+**GET /sources/export-csv**
+- Auth: Authenticated
+- Response: CSV file download
+
+### 4.6 Generation
+
+**POST /syntheses/generate**
+- Auth: Authenticated
+- Body: `{ theme_id: UUID }`
+- Response: `{ job_id: UUID }`
+- Creates job in JobStore, spawns background generation task. Returns 409 if user already has active job.
+
+**GET /syntheses/generate/{job_id}/progress**
+- Auth: Authenticated (job owner)
+- Response: SSE stream of `ProgressEvent`
+- Events: `progress` (step, message, percent), `complete` (synthesis_id), `error` (message).
+
+**POST /syntheses/generate/{job_id}/stop**
+- Auth: Authenticated (job owner)
+- Response: `{ message: string }`
+- Sets cooperative cancellation flag.
+
+### 4.7 Syntheses
+
+**GET /syntheses**
+- Auth: Authenticated
+- Response: `SynthesisListItem[]` (with section summaries, theme info)
+
+**GET /syntheses/{id}**
+- Auth: Authenticated (owner only)
+- Response: `SynthesisResponse` (full sections data)
+
+**DELETE /syntheses/{id}**
+- Auth: Authenticated (owner only)
+- Response: 204 No Content
+
+**POST /syntheses/{id}/send-email**
+- Auth: Authenticated
+- Body: `{ email: string }`
+- Response: `{ message: string }`
+
+**GET /syntheses/{id}/export/markdown**
+- Auth: Authenticated
+- Response: Markdown file download
+
+**GET /syntheses/{id}/export/pdf**
+- Auth: Authenticated
+- Response: PDF file download
+
+### 4.8 Article History & Provenance
+
+**GET /article-history?limit=&offset=&job_id=&status=**
+- Auth: Authenticated
+- Response: `{ items: ArticleHistoryEntry[], total: number }`
+
+**DELETE /article-history**
+- Auth: Authenticated
+- Response: `{ deleted: number }`
+
+**GET /syntheses/{id}/provenance**
+- Auth: Authenticated
+- Response: `ArticleHistoryEntry[]` (articles with status "used" for this synthesis's job_id)
+
+### 4.9 LLM Call Logs
+
+**GET /llm-logs/{job_id}**
+- Auth: Authenticated
+- Response: `LlmCallLogEntry[]`
+
+### 4.10 User API Keys
+
+**GET /user/api-keys**
+- Auth: Authenticated
+- Response: `ApiKeyResponse[]` (id, provider_name, key_prefix, timestamps; never the full key)
+
+**POST /user/api-keys**
+- Auth: Authenticated
+- Body: `{ provider_name, api_key }`
+- Validation: provider in (gemini, openai, anthropic, brave_search), key 8-500 chars.
+- Response: `ApiKeyResponse`
+- Encrypts key with AES-256-GCM before storage; upserts (one key per user per provider).
+
+**DELETE /user/api-keys/{provider}**
+- Auth: Authenticated
+- Response: 204 No Content
+
+**POST /user/api-keys/{provider}/test**
+- Auth: Authenticated
+- Response: `{ success: boolean, message: string }`
+- Decrypts key, calls provider test endpoint.
+
+**POST /user/api-keys/export**
+- Auth: Authenticated
+- Response: `{ keys: [{ provider_name, api_key }] }`
+- Decrypts and returns all keys (used for backup/migration).
+
+### 4.11 Public Configuration
+
+**GET /config/providers**
+- Auth: Authenticated
+- Response: `ProviderConfigResponse[]` (enabled providers with model lists for scraping and websearch)
+
+### 4.12 Admin Endpoints
+
+All admin endpoints require `AdminUser` extractor (role = admin).
+
+**GET /admin/providers**
+- Response: `AdminProviderResponse[]`
+
+**POST /admin/providers**
+- Body: `CreateProviderRequest`
+- Validation: provider_name in (gemini, openai, anthropic), at least one model per list, at most one default per list.
+- Response: `AdminProviderResponse`
+
+**PUT /admin/providers/{id}**
+- Body: `UpdateProviderRequest` (all fields optional)
+- Response: `AdminProviderResponse`
+
+**DELETE /admin/providers/{id}**
+- Response: 204 No Content
+
+**GET /admin/rate-limits**
+- Response: `RateLimitResponse[]`
+
+**PUT /admin/rate-limits/{provider_name}**
+- Body: `{ max_requests: 1-1000, time_window_seconds: 1-3600 }`
+- Response: `RateLimitResponse`
+- Hot-reloads the in-memory provider rate limiter.
+
+**GET /admin/users**
+- Response: `AdminUserResponse[]`
+
+**PUT /admin/users/{id}/role**
+- Body: `{ role: "user" | "admin" }`
+- Response: `{ message: string }`
+
+**GET /health**
+- Auth: Public
+- Response: `{ status: "ok" }`
+
+---
+
+## 5. Generation Pipeline Technical Flow
+
+### Overview
+
+The pipeline runs as a background tokio task spawned by `POST /syntheses/generate`. It has a 15-minute global timeout and supports cooperative cancellation via `AtomicBool`.
+
+### Initialization
+
+1. Load `UserSettings` from DB (or create defaults)
+2. Cleanup old article history (entries older than `article_history_days` with dropped status) and truncate old LLM call logs
+3. Load the target `Theme` (categories, max_items, max_age_days, summary_length)
+4. Load user `Sources` for the theme
+5. Decrypt user's LLM API key, create `Arc<dyn LlmProvider>` via factory
+6. Resolve models: `ai_model` (for scraping/classification) and `ai_model_websearch` (for web search); user override or admin default fallback
+7. Initialize per-user rate limiter (from settings or admin defaults)
+8. Initialize tracking structures: `article_scraped` (category -> Vec<NewsItem>), `source_counts`, `url_source`, `filled_counts`, `seen_urls`, `pending_traces`
+
+### Phase 1: Personalized Sources
+
+Skipped if user has 0 sources for the theme.
+
+**1a. Windowed source extraction**
+
+- Query article_history for the last source used; reorder sources in a rolling window starting after that source
+- Select up to `source_extraction_window` sources per generation
+- For each source (bounded concurrency of 5): fetch page HTML, extract up to `max_links_per_source` article URLs via HTML parsing (same-domain, non-homepage, no static assets)
+- Deduplicate URLs cross-source via `seen_urls`
+- Batch-check `article_history` for already-seen URL hashes; filter matches (traced as `filtered_history`)
+- Shuffle remaining candidates to interleave sources
+- Track url -> source in `url_source`
+
+**1b. Batch scrape + classify**
+
+Processing in batches of `settings.batch_size`:
+
+- **Batch assembly**: Pull up to batch_size candidates, skip if `source_counts[domain] >= max_articles_per_source` (traced as `filtered_diversity`)
+- **Scrape** (JoinSet, parallel): SSRF check, 15s timeout, 5MB limit, HTML parsing, title/date/body extraction, soft-404 detection. Skip empty/too-old articles.
+- **Classify** (JoinSet, parallel): Rate limit check (60s wait), send title + first 500 chars to LLM with categories list. LLM returns `{title, summary, category}`. Validate category via `assign_category()` (fallback to "Autre", drop if full).
+- **LLM call logging**: Every LLM call is logged with full prompt, response, timing, and article URL.
+- **Early exit**: Stop when total articles >= `(num_categories + 1) * max_items_per_category`.
+- Batch-flush pending traces to `article_history`.
+
+### Phase 2: Web Search Fallback
+
+Skipped if all categories are filled to `max_items_per_category`.
+
+**2a. Compute gaps**: For each category, `needed = max_items - filled`.
+
+**2b. Path selection** based on `settings.use_brave_search`:
+
+**Path A -- Brave Search** (`use_brave_search = true`):
+- Decrypt user's Brave Search API key
+- Query: `"{theme} actualites"`, up to 20 results, freshness mapped from `max_age_days` (pd/pw/pm/py)
+- Filter results through `filter_phase2_url()`: homepage filter, cross-phase dedup, article history check, source diversity check
+- Batch scrape + classify (same logic as Phase 1b, source_type = "brave_search")
+
+**Path B -- LLM Web Search** (`use_brave_search = false`):
+- Build search prompt with theme, categories, and gap counts
+- Call LLM with `ai_model_websearch` model; returns structured JSON: `{category_0: [{title, url, summary}], ...}`
+- Filter URLs through `filter_phase2_url()`
+- Scrape each result sequentially to validate; keep LLM-provided title/summary (no re-classification)
+- source_type = "web_search"
+
+### Save & Record
+
+1. Error if all article lists are empty
+2. Order sections: user-defined categories first (in order), then "Autre" if non-empty
+3. Sanitize: strip `\u0000` null bytes from JSON (PostgreSQL JSONB requirement)
+4. Insert synthesis row: job_id, week (ISO week string), sections (JSONB), status "completed", theme_id
+5. Record used articles: batch-insert `article_history` entries with status "used", synthesis_id, and correct source_type
+
+---
+
+## 6. LLM Provider Abstraction
+
+### Trait Definition
+
+```rust
+#[async_trait]
+pub trait LlmProvider: Send + Sync {
+    fn provider_id(&self) -> &str;
+    async fn call_llm(&self, model: &str, system_prompt: &str,
+                       user_prompt: &str, response_schema: &Value)
+        -> Result<Value, AppError>;
+}
+```
+
+All calls use structured JSON output (response_schema defines the expected shape).
+
+### Implementations
+
+| Provider | Module | API Endpoint | Auth Method |
+|---|---|---|---|
+| Google Gemini | `llm/gemini.rs` | `generativelanguage.googleapis.com` | Query param `?key=` |
+| OpenAI | `llm/openai.rs` | `api.openai.com/v1/chat/completions` | Bearer token |
+| Anthropic | `llm/anthropic.rs` | `api.anthropic.com/v1/messages` | `x-api-key` header |
+| Mock | `llm/mock.rs` | N/A (in-memory) | N/A |
+
+### Factory
+
+`llm/factory.rs` provides `create_provider(provider_name, api_key, http_client) -> Arc<dyn LlmProvider>`. Matches on provider name string.
+
+### Response Schema
+
+`llm/schema.rs` builds JSON Schema definitions for:
+- Classification/summarization: `{title, summary, category, is_article}`
+- Web search: `{category_0: [{title, url, summary}], ...}` with per-category arrays
+- Source link extraction: `{links: [{url}]}`
+
+### Error Mapping
+
+`map_provider_http_error()` translates HTTP status codes to `AppError` variants:
+- 400 -> BadRequest
+- 401/403 -> BadRequest (invalid key)
+- 404 -> BadRequest (model not found)
+- 429/529 -> RateLimited
+- Other -> Internal
+
+---
+
+## 7. Background Tasks
+
+### Session Cleanup
+
+Runs hourly via `tokio::spawn`. Calls `db::sessions::delete_expired` to remove sessions past their `expires_at` timestamp.
+
+### Job Store Cleanup
+
+`JobStore::cleanup_expired` removes job entries older than 1 hour (the TTL constant). Called periodically. Releases user locks for expired jobs.
+
+### Scheduler
+
+Runs every minute via `tokio::spawn` with a 60-second interval. For each tick:
+
+1. `current_day_code()` -> "mon" through "sun"
+2. `find_due_schedules(pool, day, time)` -> queries enabled schedules matching current day and time (HH:MM)
+3. For each due schedule:
+   - Skip if `job_store.has_active_job(user_id)` returns Some (manual generation in progress)
+   - Create a temporary `watch::channel` and `AtomicBool`
+   - Call `synthesis::run_generation_inner` directly (bypasses job store)
+   - On success: send emails to configured recipients (up to 3), mark schedule as run
+   - On failure: log error, do not mark as run
+
+---
+
+## 8. Configuration
+
+### Environment Variables
+
+| Variable | Required | Default | Description |
+|---|---|---|---|
+| DATABASE_URL | Yes | - | PostgreSQL connection string |
+| MASTER_ENCRYPTION_KEY | Yes | - | 64 hex chars (32 bytes) for AES-256-GCM |
+| APP_URL | Yes | - | Public URL (CORS, magic links, cookies). No trailing slash. |
+| PORT | No | 8080 | HTTP server port |
+| RUST_LOG | No | - | Logging filter (e.g., "info,ai_synth_backend=debug") |
+| STATIC_DIR | No | ../frontend/dist | Path to built SolidJS files |
+| RESEND_API_KEY | Yes | - | Resend email service API key |
+| EMAIL_FROM | Yes | - | Sender address for emails |
+| TURNSTILE_SECRET_KEY | Yes | - | Cloudflare Turnstile server secret |
+| TURNSTILE_SITE_KEY | Yes | - | Cloudflare Turnstile client key |
+| POSTGRES_PASSWORD | Yes | - | Used by docker-compose for DB container |
+
+### Startup Validation
+
+`AppConfig::validate()` checks at startup:
+- `MASTER_ENCRYPTION_KEY` is exactly 64 hex characters
+- `APP_URL` starts with http:// or https:// and has no trailing slash
+
+The application refuses to start with invalid configuration.
+
+### User Settings Model
+
+Default values applied when a user has no saved settings:
+
+| Setting | Default | Range |
+|---|---|---|
+| max_articles_per_source | 3 | 1-10 |
+| max_links_per_source | 8 | 1-30 |
+| use_brave_search | false | boolean |
+| article_history_days | 90 | 0-365 |
+| batch_size | 5 | 1-20 |
+| source_extraction_window | 3 | 1-10 |
+| search_agent_behavior | "" | max 2000 chars |
+| ai_provider | "" | max 100 chars |
+| ai_model | "" | max 100 chars |
+| ai_model_websearch | "" | max 100 chars |
+| rate_limit_max_requests | null | >= 1 if set |
+| rate_limit_time_window_seconds | null | >= 1 if set |