# AI Weekly Synth -- Technical Specifications ## 1. Backend Tech Stack | Dependency | Version | Purpose | |---|---|---| | axum | 0.8 | Web framework (macros, multipart) | | tokio | 1 | Async runtime (full features) | | tower | 0.5 | Middleware composition | | tower-http | 0.6 | CORS, static files, tracing, headers | | sqlx | 0.8 | Async Postgres driver (runtime-tokio, tls-rustls, uuid, chrono, json, migrate) | | reqwest | 0.12 | HTTP client (JSON) | | serde / serde_json | 1 | Serialization/deserialization | | chrono | 0.4 | Date/time handling (serde feature) | | aes-gcm | 0.10 | AES-256-GCM encryption | | zeroize | 1 | Secure memory zeroing | | sha2 | 0.10 | SHA-256 hashing | | rand | 0.8 | Random number generation | | base64 | 0.22 | Base64 encoding | | hex | 0.4 | Hex encoding/decoding | | async-trait | 0.1 | Async trait objects | | tracing / tracing-subscriber | 0.1 / 0.3 | Structured logging (env-filter, json) | | dotenvy | 0.15 | .env file loading | | clap | 4 | CLI argument parsing | | scraper | 0.22 | HTML parsing (CSS selectors) | | ego-tree | 0.10 | Tree data structure (used by scraper) | | url | 2 | URL parsing and validation | | email_address | 0.2 | Email validation | | anyhow | 1 | Error context | | thiserror | 2 | Error type derivation | | uuid | 1 | UUID v4 generation (serde feature) | | dashmap | 6 | Concurrent hash maps | | tokio-stream | 0.1 | Stream utilities for SSE | | futures | 0.3 | Async stream combinators | | printpdf | 0.7 | PDF generation | **Dev dependencies**: tower (util), http-body-util, wiremock 0.6. **Rust edition**: 2021. --- ## 2. Frontend Tech Stack | Dependency | Version | Purpose | |---|---|---| | solid-js | ^1.9.0 | Reactive UI framework | | @solidjs/router | ^0.15.0 | Client-side routing | | lucide-solid | ^0.475.0 | Icon library | | date-fns | ^4.1.0 | Date formatting | | tailwindcss | ^4.1.0 | Utility-first CSS (v4) | | @tailwindcss/vite | ^4.1.0 | Tailwind Vite plugin | | vite | ^6.2.0 | Build tool and dev server | | vite-plugin-solid | ^2.11.0 | SolidJS Vite integration | | typescript | ~5.8.0 | Type checking | | vitest | ^3.0.0 | Unit testing | | @solidjs/testing-library | ^0.8.0 | Component testing | | jsdom | ^25.0.0 | DOM environment for tests | ### Frontend Routes | Path | Component | Auth | Description | |---|---|---|---| | /login | Login | Public | Login page | | /register | Register | Public | Registration page | | /auth/verify | AuthVerify | Public | Magic link verification | | / | Home | Protected | Dashboard / synthesis list | | /settings | Settings | Protected | User settings | | /themes | ThemeManager | Protected | Theme CRUD + source management | | /generate | GenerateSynthesis | Protected | Generation trigger + progress | | /synthesis/:id | SynthesisDetail | Protected | Full synthesis view | | /article-history | ArticleHistory | Protected | Article history browser | | /llm-logs/:jobId | LlmLogs | Protected | LLM call log viewer | | /admin/providers | AdminProviders | Admin | Provider configuration | | /admin/rate-limits | AdminRateLimits | Admin | Rate limit configuration | | /admin/users | AdminUsers | Admin | User management | --- ## 3. Database Schema ### 3.1 `users` | Column | Type | Constraints | |---|---|---| | id | UUID | PK, DEFAULT gen_random_uuid() | | email | TEXT | NOT NULL, UNIQUE | | display_name | TEXT | nullable | | role | TEXT | NOT NULL, DEFAULT 'user', CHECK (user/admin) | | created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() | | updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() | Indexes: `idx_users_email` on (email). ### 3.2 `sessions` | Column | Type | Constraints | |---|---|---| | session_hash | TEXT | PK (SHA-256 of raw token) | | user_id | UUID | NOT NULL, FK users(id) CASCADE | | created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() | | expires_at | TIMESTAMPTZ | NOT NULL | | last_active_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() | | ip_address | TEXT | nullable | | user_agent | TEXT | nullable | Indexes: `idx_sessions_user_id`, `idx_sessions_expires_at`. ### 3.3 `magic_tokens` | Column | Type | Constraints | |---|---|---| | id | UUID | PK, DEFAULT gen_random_uuid() | | email | TEXT | NOT NULL | | token_hash | TEXT | NOT NULL, UNIQUE | | created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() | | expires_at | TIMESTAMPTZ | NOT NULL | | used | BOOLEAN | NOT NULL, DEFAULT false | Indexes: `idx_magic_tokens_email`, `idx_magic_tokens_expires`. ### 3.4 `settings` Per-user pipeline configuration. One row per user (user_id is the PK). | Column | Type | Constraints | |---|---|---| | user_id | UUID | PK, FK users(id) CASCADE | | max_articles_per_source | INTEGER | NOT NULL, DEFAULT 3 | | max_links_per_source | INTEGER | NOT NULL, DEFAULT 8 | | use_brave_search | BOOLEAN | NOT NULL, DEFAULT false | | article_history_days | INTEGER | NOT NULL, DEFAULT 90 | | batch_size | INTEGER | NOT NULL, DEFAULT 5 | | source_extraction_window | INTEGER | NOT NULL, DEFAULT 3 | | search_agent_behavior | TEXT | NOT NULL, DEFAULT '' | | ai_provider | TEXT | NOT NULL, DEFAULT '' | | ai_model | TEXT | NOT NULL, DEFAULT '' | | ai_model_websearch | TEXT | NOT NULL, DEFAULT '' | | rate_limit_max_requests | INTEGER | nullable | | rate_limit_time_window_seconds | INTEGER | nullable | | updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() | ### 3.5 `themes` Per-user topic configurations with content settings. | Column | Type | Constraints | |---|---|---| | id | UUID | PK, DEFAULT gen_random_uuid() | | user_id | UUID | NOT NULL, FK users(id) CASCADE | | name | TEXT | NOT NULL | | theme | TEXT | NOT NULL (search topic) | | categories | JSONB | NOT NULL, DEFAULT '[]' | | max_items_per_category | INTEGER | NOT NULL, DEFAULT 4 | | max_age_days | INTEGER | NOT NULL, DEFAULT 7 | | summary_length | INTEGER | NOT NULL, DEFAULT 3 | | created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() | | updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() | Indexes: `idx_themes_user_id`. ### 3.6 `sources` User-curated news source URLs, optionally tied to a theme. | Column | Type | Constraints | |---|---|---| | id | UUID | PK, DEFAULT gen_random_uuid() | | user_id | UUID | NOT NULL, FK users(id) CASCADE | | title | VARCHAR(200) | NOT NULL, CHECK length 1-200 | | url | VARCHAR(1000) | NOT NULL, CHECK length <= 1000 | | theme_id | UUID | nullable, FK themes(id) CASCADE | | is_preferred | BOOLEAN | NOT NULL, DEFAULT false | | created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() | Indexes: `idx_sources_user_id`, UNIQUE `idx_sources_user_id_url` on (user_id, url). ### 3.7 `syntheses` Generated synthesis results with JSONB section data. | Column | Type | Constraints | |---|---|---| | id | UUID | PK, DEFAULT gen_random_uuid() | | user_id | UUID | NOT NULL, FK users(id) CASCADE | | week | VARCHAR(10) | NOT NULL (ISO week string) | | sections | JSONB | NOT NULL, DEFAULT '[]' | | status | VARCHAR(20) | NOT NULL, DEFAULT 'completed' | | job_id | UUID | nullable | | theme_id | UUID | nullable, FK themes(id) SET NULL | | created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() | Indexes: `idx_syntheses_user_id_created_at` on (user_id, created_at DESC). JSONB structure for `sections`: ```json [ { "title": "Category Name", "items": [ { "title": "Article Title", "url": "https://...", "summary": "...", "date": "2026-03-25" } ] } ] ``` ### 3.8 `theme_schedules` Automated generation schedules, one per theme. | Column | Type | Constraints | |---|---|---| | id | UUID | PK, DEFAULT gen_random_uuid() | | theme_id | UUID | NOT NULL, UNIQUE, FK themes(id) CASCADE | | user_id | UUID | NOT NULL, FK users(id) CASCADE | | enabled | BOOLEAN | NOT NULL, DEFAULT true | | days | JSONB | NOT NULL, DEFAULT '[]' (e.g. ["mon","fri"]) | | time_utc | TEXT | NOT NULL, DEFAULT '08:00' (HH:MM) | | emails | JSONB | NOT NULL, DEFAULT '[]' (up to 3 addresses) | | last_run_at | TIMESTAMPTZ | nullable | | created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() | | updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() | Indexes: `idx_theme_schedules_enabled` (partial, WHERE enabled = true). ### 3.9 `article_history` Article URL deduplication and full provenance tracing. | Column | Type | Constraints | |---|---|---| | id | UUID | PK, DEFAULT gen_random_uuid() | | user_id | UUID | NOT NULL, FK users(id) CASCADE | | url_hash | TEXT | NOT NULL (SHA-256 of normalized URL) | | url | TEXT | NOT NULL | | title | TEXT | NOT NULL, DEFAULT '' | | source_type | TEXT | NOT NULL, DEFAULT 'unknown' | | source_url | TEXT | nullable | | category | TEXT | nullable | | synthesis_id | UUID | nullable, FK syntheses(id) SET NULL | | status | TEXT | NOT NULL, DEFAULT 'used' | | scraped_ok | BOOLEAN | NOT NULL, DEFAULT true | | job_id | UUID | NOT NULL | | published_date | TEXT | nullable | | created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() | Indexes: `idx_article_history_user_url` on (user_id, url_hash), `idx_article_history_job_id`. Status values: `used`, `filtered_history`, `filtered_diversity`, `filtered_not_article`, `filtered_too_old`, `filtered_empty`, `filtered_homepage`, `filtered_cross_phase_dedup`. Source type values: `personalized_source`, `brave_search`, `web_search`. ### 3.10 `llm_call_log` Full LLM interaction logging for debugging and analysis. | Column | Type | Constraints | |---|---|---| | id | UUID | PK, DEFAULT gen_random_uuid() | | user_id | UUID | NOT NULL, FK users(id) CASCADE | | job_id | UUID | NOT NULL | | call_type | TEXT | NOT NULL | | model | TEXT | NOT NULL | | system_prompt | TEXT | NOT NULL, DEFAULT '' | | user_prompt | TEXT | NOT NULL, DEFAULT '' | | response_body | TEXT | NOT NULL, DEFAULT '' | | duration_ms | INTEGER | NOT NULL, DEFAULT 0 | | article_url | TEXT | nullable | | created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() | Indexes: `idx_llm_call_log_job_id`, `idx_llm_call_log_user_id` on (user_id, created_at). ### 3.11 `admin_providers` Admin-curated catalog of LLM providers and their models. | Column | Type | Constraints | |---|---|---| | id | UUID | PK, DEFAULT gen_random_uuid() | | provider_name | VARCHAR(50) | NOT NULL, UNIQUE | | display_name | VARCHAR(100) | NOT NULL | | models_scraping | JSONB | NOT NULL, DEFAULT '[]' | | models_websearch | JSONB | NOT NULL, DEFAULT '[]' | | is_enabled | BOOLEAN | NOT NULL, DEFAULT true | | created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() | | updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() | Indexes: `idx_admin_providers_enabled` (partial, WHERE is_enabled = true). Seeded with: gemini, openai, anthropic. JSONB model structure: ```json [{"model_id": "gemini-2.5-pro", "display_name": "Gemini 2.5 Pro", "is_default": true}] ``` ### 3.12 `admin_rate_limits` Per-provider rate limit configuration. | Column | Type | Constraints | |---|---|---| | id | UUID | PK, DEFAULT gen_random_uuid() | | provider_name | VARCHAR(50) | NOT NULL, UNIQUE, FK admin_providers(provider_name) CASCADE | | max_requests | INTEGER | NOT NULL, DEFAULT 30 | | time_window_seconds | INTEGER | NOT NULL, DEFAULT 60 | | updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() | Seeded defaults: gemini 29/60s, openai 50/60s, anthropic 40/60s. ### 3.13 `user_api_keys` Encrypted user LLM API keys. | Column | Type | Constraints | |---|---|---| | id | UUID | PK, DEFAULT gen_random_uuid() | | user_id | UUID | NOT NULL, FK users(id) CASCADE | | provider_name | VARCHAR(50) | NOT NULL | | encrypted_key | BYTEA | NOT NULL | | nonce | BYTEA | NOT NULL | | key_prefix | VARCHAR(20) | NOT NULL | | created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() | | updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() | Constraint: UNIQUE(user_id, provider_name). Valid providers: gemini, openai, anthropic, brave_search. ### 3.14 `audit_log` Admin mutation audit trail. | Column | Type | Constraints | |---|---|---| | id | UUID | PK, DEFAULT gen_random_uuid() | | admin_user_id | UUID | nullable, FK users(id) SET NULL | | action | VARCHAR(100) | NOT NULL | | target_type | VARCHAR(50) | nullable | | target_id | VARCHAR(255) | nullable | | details | JSONB | nullable | | created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() | Indexes: `idx_audit_log_created_at` (DESC), `idx_audit_log_admin_user`. --- ## 4. API Endpoints All endpoints are prefixed with `/api/v1`. Responses are JSON. Errors follow the shape `{ "error": "message" }`. ### 4.1 Authentication **POST /auth/register** - Auth: Public - Body: `{ email: string, display_name?: string, turnstile_token: string }` - Response: `{ message: string }` - Sends magic link email. Rate limited. **POST /auth/login** - Auth: Public - Body: `{ email: string, turnstile_token: string }` - Response: `{ message: string }` - Sends magic link email. Rate limited. **GET /auth/verify?token=...&email=...** - Auth: Public - Response: Redirect to frontend with session cookie set. **POST /auth/verify** - Auth: Public - Body: `{ token: string, email: string }` - Response: `{ message: string, user: User }` - Sets `session` HttpOnly cookie (30-day expiry). **POST /auth/logout** - Auth: Authenticated - Response: `{ message: string }` - Clears session cookie and deletes DB session. **GET /auth/me** - Auth: Authenticated - Response: `{ id, email, display_name, role, created_at }` ### 4.2 Settings **GET /settings** - Auth: Authenticated - Response: `UserSettings` (creates defaults if not exists) **PUT /settings** - Auth: Authenticated - Body: `UpdateSettingsRequest` (all fields required) - Validation: max_articles_per_source 1-10, max_links_per_source 1-30, batch_size 1-20, source_extraction_window 1-10, article_history_days 0-365, search_agent_behavior max 2000 chars, ai_provider/ai_model/ai_model_websearch max 100 chars. - Response: Updated `UserSettings` ### 4.3 Themes **GET /themes** - Auth: Authenticated - Response: `ThemeResponse[]` **POST /themes** - Auth: Authenticated - Body: `{ name, theme, categories: string[], max_items_per_category?, max_age_days?, summary_length? }` - Validation: name non-empty max 200 chars, categories 1-20 non-empty entries, max_items 1-50, max_age 1-365, summary_length 1-3. - Response: `ThemeResponse` **PUT /themes/{id}** - Auth: Authenticated (owner only) - Body: `UpdateThemeRequest` (all fields optional) - Response: `ThemeResponse` **DELETE /themes/{id}** - Auth: Authenticated (owner only) - Response: 204 No Content ### 4.4 Schedules **GET /themes/{id}/schedule** - Auth: Authenticated (theme owner) - Response: `ScheduleResponse` or 404 **PUT /themes/{id}/schedule** - Auth: Authenticated (theme owner) - Body: `{ enabled, days: string[], time_utc: "HH:MM", emails: string[] }` - Validation: days from mon-sun, time HH:MM format, max 3 emails. - Response: `ScheduleResponse` **DELETE /themes/{id}/schedule** - Auth: Authenticated (theme owner) - Response: 204 No Content ### 4.5 Sources **GET /sources?theme_id=...** - Auth: Authenticated - Response: `SourceResponse[]` **POST /sources** - Auth: Authenticated - Body: `{ title, url, theme_id? }` - Validation: title non-empty max 200, URL http(s) max 1000 chars. - Response: `SourceResponse` **PUT /sources/preferred** - Auth: Authenticated - Body: `{ source_ids: UUID[] }` - Response: `{ updated: number }` **DELETE /sources/{id}** - Auth: Authenticated (owner only) - Response: 204 No Content **POST /sources/bulk** - Auth: Authenticated - Body: `{ sources: CreateSourceRequest[], theme_id? }` - Response: `{ imported, skipped, errors }` **POST /sources/import-csv** - Auth: Authenticated - Body: Multipart file upload (CSV: title,url) - Response: `{ imported, skipped, errors }` **GET /sources/export-csv** - Auth: Authenticated - Response: CSV file download ### 4.6 Generation **POST /syntheses/generate** - Auth: Authenticated - Body: `{ theme_id: UUID }` - Response: `{ job_id: UUID }` - Creates job in JobStore, spawns background generation task. Returns 409 if user already has active job. **GET /syntheses/generate/{job_id}/progress** - Auth: Authenticated (job owner) - Response: SSE stream of `ProgressEvent` - Events: `progress` (step, message, percent), `complete` (synthesis_id), `error` (message). **POST /syntheses/generate/{job_id}/stop** - Auth: Authenticated (job owner) - Response: `{ message: string }` - Sets cooperative cancellation flag. ### 4.7 Syntheses **GET /syntheses** - Auth: Authenticated - Response: `SynthesisListItem[]` (with section summaries, theme info) **GET /syntheses/{id}** - Auth: Authenticated (owner only) - Response: `SynthesisResponse` (full sections data) **DELETE /syntheses/{id}** - Auth: Authenticated (owner only) - Response: 204 No Content **POST /syntheses/{id}/send-email** - Auth: Authenticated - Body: `{ email: string }` - Response: `{ message: string }` **GET /syntheses/{id}/export/markdown** - Auth: Authenticated - Response: Markdown file download **GET /syntheses/{id}/export/pdf** - Auth: Authenticated - Response: PDF file download ### 4.8 Article History & Provenance **GET /article-history?limit=&offset=&job_id=&status=** - Auth: Authenticated - Response: `{ items: ArticleHistoryEntry[], total: number }` **DELETE /article-history** - Auth: Authenticated - Response: `{ deleted: number }` **GET /syntheses/{id}/provenance** - Auth: Authenticated - Response: `ArticleHistoryEntry[]` (articles with status "used" for this synthesis's job_id) ### 4.9 LLM Call Logs **GET /llm-logs/{job_id}** - Auth: Authenticated - Response: `LlmCallLogEntry[]` ### 4.10 User API Keys **GET /user/api-keys** - Auth: Authenticated - Response: `ApiKeyResponse[]` (id, provider_name, key_prefix, timestamps; never the full key) **POST /user/api-keys** - Auth: Authenticated - Body: `{ provider_name, api_key }` - Validation: provider in (gemini, openai, anthropic, brave_search), key 8-500 chars. - Response: `ApiKeyResponse` - Encrypts key with AES-256-GCM before storage; upserts (one key per user per provider). **DELETE /user/api-keys/{provider}** - Auth: Authenticated - Response: 204 No Content **POST /user/api-keys/{provider}/test** - Auth: Authenticated - Response: `{ success: boolean, message: string }` - Decrypts key, calls provider test endpoint. **POST /user/api-keys/export** - Auth: Authenticated - Response: `{ keys: [{ provider_name, api_key }] }` - Decrypts and returns all keys (used for backup/migration). ### 4.11 Public Configuration **GET /config/providers** - Auth: Authenticated - Response: `ProviderConfigResponse[]` (enabled providers with model lists for scraping and websearch) ### 4.12 Admin Endpoints All admin endpoints require `AdminUser` extractor (role = admin). **GET /admin/providers** - Response: `AdminProviderResponse[]` **POST /admin/providers** - Body: `CreateProviderRequest` - Validation: provider_name in (gemini, openai, anthropic), at least one model per list, at most one default per list. - Response: `AdminProviderResponse` **PUT /admin/providers/{id}** - Body: `UpdateProviderRequest` (all fields optional) - Response: `AdminProviderResponse` **DELETE /admin/providers/{id}** - Response: 204 No Content **GET /admin/rate-limits** - Response: `RateLimitResponse[]` **PUT /admin/rate-limits/{provider_name}** - Body: `{ max_requests: 1-1000, time_window_seconds: 1-3600 }` - Response: `RateLimitResponse` - Hot-reloads the in-memory provider rate limiter. **GET /admin/users** - Response: `AdminUserResponse[]` **PUT /admin/users/{id}/role** - Body: `{ role: "user" | "admin" }` - Response: `{ message: string }` **GET /health** - Auth: Public - Response: `{ status: "ok" }` --- ## 5. Generation Pipeline Technical Flow ### Overview The pipeline runs as a background tokio task spawned by `POST /syntheses/generate`. It has a 15-minute global timeout and supports cooperative cancellation via `AtomicBool`. ### Initialization 1. Load `UserSettings` from DB (or create defaults) 2. Cleanup old article history (entries older than `article_history_days` with dropped status) and truncate old LLM call logs 3. Load the target `Theme` (categories, max_items, max_age_days, summary_length) 4. Load user `Sources` for the theme 5. Decrypt user's LLM API key, create `Arc` via factory 6. Resolve models: `ai_model` (for scraping/classification) and `ai_model_websearch` (for web search); user override or admin default fallback 7. Initialize per-user rate limiter (from settings or admin defaults) 8. Initialize tracking structures: `article_scraped` (category -> Vec), `source_counts`, `url_source`, `filled_counts`, `seen_urls`, `pending_traces` ### Phase 1: Personalized Sources Skipped if user has 0 sources for the theme. **1a. Windowed source extraction** - Query article_history for the last source used; reorder sources in a rolling window starting after that source - Select up to `source_extraction_window` sources per generation - For each source (bounded concurrency of 5): fetch page HTML, extract up to `max_links_per_source` article URLs via HTML parsing (same-domain, non-homepage, no static assets) - Deduplicate URLs cross-source via `seen_urls` - Batch-check `article_history` for already-seen URL hashes; filter matches (traced as `filtered_history`) - Shuffle remaining candidates to interleave sources - Track url -> source in `url_source` **1b. Batch scrape + classify** Processing in batches of `settings.batch_size`: - **Batch assembly**: Pull up to batch_size candidates, skip if `source_counts[domain] >= max_articles_per_source` (traced as `filtered_diversity`) - **Scrape** (JoinSet, parallel): SSRF check, 15s timeout, 5MB limit, HTML parsing, title/date/body extraction, soft-404 detection. Skip empty/too-old articles. - **Classify** (JoinSet, parallel): Rate limit check (60s wait), send title + first 500 chars to LLM with categories list. LLM returns `{title, summary, category}`. Validate category via `assign_category()` (fallback to "Autre", drop if full). - **LLM call logging**: Every LLM call is logged with full prompt, response, timing, and article URL. - **Early exit**: Stop when total articles >= `(num_categories + 1) * max_items_per_category`. - Batch-flush pending traces to `article_history`. ### Phase 2: Web Search Fallback Skipped if all categories are filled to `max_items_per_category`. **2a. Compute gaps**: For each category, `needed = max_items - filled`. **2b. Path selection** based on `settings.use_brave_search`: **Path A -- Brave Search** (`use_brave_search = true`): - Decrypt user's Brave Search API key - Query: `"{theme} actualites"`, up to 20 results, freshness mapped from `max_age_days` (pd/pw/pm/py) - Filter results through `filter_phase2_url()`: homepage filter, cross-phase dedup, article history check, source diversity check - Batch scrape + classify (same logic as Phase 1b, source_type = "brave_search") **Path B -- LLM Web Search** (`use_brave_search = false`): - Build search prompt with theme, categories, and gap counts - Call LLM with `ai_model_websearch` model; returns structured JSON: `{category_0: [{title, url, summary}], ...}` - Filter URLs through `filter_phase2_url()` - Scrape each result sequentially to validate; keep LLM-provided title/summary (no re-classification) - source_type = "web_search" ### Save & Record 1. Error if all article lists are empty 2. Order sections: user-defined categories first (in order), then "Autre" if non-empty 3. Sanitize: strip `\u0000` null bytes from JSON (PostgreSQL JSONB requirement) 4. Insert synthesis row: job_id, week (ISO week string), sections (JSONB), status "completed", theme_id 5. Record used articles: batch-insert `article_history` entries with status "used", synthesis_id, and correct source_type --- ## 6. LLM Provider Abstraction ### Trait Definition ```rust #[async_trait] pub trait LlmProvider: Send + Sync { fn provider_id(&self) -> &str; async fn call_llm(&self, model: &str, system_prompt: &str, user_prompt: &str, response_schema: &Value) -> Result; } ``` All calls use structured JSON output (response_schema defines the expected shape). ### Implementations | Provider | Module | API Endpoint | Auth Method | |---|---|---|---| | Google Gemini | `llm/gemini.rs` | `generativelanguage.googleapis.com` | Query param `?key=` | | OpenAI | `llm/openai.rs` | `api.openai.com/v1/chat/completions` | Bearer token | | Anthropic | `llm/anthropic.rs` | `api.anthropic.com/v1/messages` | `x-api-key` header | | Mock | `llm/mock.rs` | N/A (in-memory) | N/A | ### Factory `llm/factory.rs` provides `create_provider(provider_name, api_key, http_client) -> Arc`. Matches on provider name string. ### Response Schema `llm/schema.rs` builds JSON Schema definitions for: - Classification/summarization: `{title, summary, category, is_article}` - Web search: `{category_0: [{title, url, summary}], ...}` with per-category arrays - Source link extraction: `{links: [{url}]}` ### Error Mapping `map_provider_http_error()` translates HTTP status codes to `AppError` variants: - 400 -> BadRequest - 401/403 -> BadRequest (invalid key) - 404 -> BadRequest (model not found) - 429/529 -> RateLimited - Other -> Internal --- ## 7. Background Tasks ### Session Cleanup Runs hourly via `tokio::spawn`. Calls `db::sessions::delete_expired` to remove sessions past their `expires_at` timestamp. ### Job Store Cleanup `JobStore::cleanup_expired` removes job entries older than 1 hour (the TTL constant). Called periodically. Releases user locks for expired jobs. ### Scheduler Runs every minute via `tokio::spawn` with a 60-second interval. For each tick: 1. `current_day_code()` -> "mon" through "sun" 2. `find_due_schedules(pool, day, time)` -> queries enabled schedules matching current day and time (HH:MM) 3. For each due schedule: - Skip if `job_store.has_active_job(user_id)` returns Some (manual generation in progress) - Create a temporary `watch::channel` and `AtomicBool` - Call `synthesis::run_generation_inner` directly (bypasses job store) - On success: send emails to configured recipients (up to 3), mark schedule as run - On failure: log error, do not mark as run --- ## 8. Configuration ### Environment Variables | Variable | Required | Default | Description | |---|---|---|---| | DATABASE_URL | Yes | - | PostgreSQL connection string | | MASTER_ENCRYPTION_KEY | Yes | - | 64 hex chars (32 bytes) for AES-256-GCM | | APP_URL | Yes | - | Public URL (CORS, magic links, cookies). No trailing slash. | | PORT | No | 8080 | HTTP server port | | RUST_LOG | No | - | Logging filter (e.g., "info,ai_synth_backend=debug") | | STATIC_DIR | No | ../frontend/dist | Path to built SolidJS files | | RESEND_API_KEY | Yes | - | Resend email service API key | | EMAIL_FROM | Yes | - | Sender address for emails | | TURNSTILE_SECRET_KEY | Yes | - | Cloudflare Turnstile server secret | | TURNSTILE_SITE_KEY | Yes | - | Cloudflare Turnstile client key | | POSTGRES_PASSWORD | Yes | - | Used by docker-compose for DB container | ### Startup Validation `AppConfig::validate()` checks at startup: - `MASTER_ENCRYPTION_KEY` is exactly 64 hex characters - `APP_URL` starts with http:// or https:// and has no trailing slash The application refuses to start with invalid configuration. ### User Settings Model Default values applied when a user has no saved settings: | Setting | Default | Range | |---|---|---| | max_articles_per_source | 3 | 1-10 | | max_links_per_source | 8 | 1-30 | | use_brave_search | false | boolean | | article_history_days | 90 | 0-365 | | batch_size | 5 | 1-20 | | source_extraction_window | 3 | 1-10 | | search_agent_behavior | "" | max 2000 chars | | ai_provider | "" | max 100 chars | | ai_model | "" | max 100 chars | | ai_model_websearch | "" | max 100 chars | | rate_limit_max_requests | null | >= 1 if set | | rate_limit_time_window_seconds | null | >= 1 if set |