16 KiB
AI Weekly Synth -- Architecture Document
1. System Overview
AI Weekly Synth is a self-hosted web application that generates AI-powered weekly news syntheses. Users configure topics (themes), categories, and an LLM provider; the system then searches the web, scrapes and validates sources, classifies articles, and produces structured summaries.
Technology Stack
| Layer | Technology |
|---|---|
| Backend | Rust (Axum 0.8) |
| Frontend | SolidJS 1.9 + Tailwind CSS v4 |
| Database | PostgreSQL 17 (via sqlx with compile-time query checking) |
| Deployment | Docker Compose (app + Postgres) |
Deployment Topology
docker-compose.yml
├── app (ai-synth) port 8080
│ ├── Axum HTTP server
│ ├── Static file serving (SPA fallback)
│ └── Background tasks (scheduler, session cleanup, job TTL)
└── db (postgres:17-alpine) port 5432 (localhost only)
└── postgres_data volume
The app container builds from a multi-stage Dockerfile, serves the SolidJS frontend as static files, and connects to Postgres over the internal bridge network.
2. Layer Architecture
The backend follows a three-layer architecture with shared model types:
handlers/ (HTTP layer)
│
├── extracts request data (Axum extractors, JSON, path params)
├── validates input
├── calls services/ or db/ directly
└── formats HTTP responses
│
services/ (Business logic)
│
├── synthesis pipeline orchestration
├── LLM provider abstraction + factory
├── scraping (articles, source pages)
├── encryption, email, CSV, PDF export
├── rate limiting, job store, scheduler
└── Brave Search client
│
db/ (Data access)
│
├── pure SQL queries via sqlx
├── typed result mapping (FromRow)
└── no business logic
│
models/ (Shared types -- used by all layers)
│
├── domain structs (User, Theme, Source, Synthesis, etc.)
├── request/response DTOs
└── validation logic
Module Inventory
Handlers (handlers/): admin, api_keys, article_history, auth, config, generation, health, llm_logs, schedules, settings, sources, syntheses, themes
Services (services/): auth, brave_search, csv, email, encryption, export, job_store, llm (with gemini, openai, anthropic, mock, factory, schema), prompts, rate_limiter, scheduler, scraper, source_scraper, synthesis, turnstile
DB (db/): api_keys, article_history, audit, llm_call_log, magic_links, providers, rate_limits, schedules, sessions, settings, sources, syntheses, themes, users
Models (models/): api_key, audit, magic_link, provider, rate_limit, schedule, session, settings, source, synthesis, theme, user
3. Key Components
3.1 LLM Provider Abstraction
The LlmProvider trait defines a unified interface for all LLM backends:
#[async_trait]
pub trait LlmProvider: Send + Sync {
fn provider_id(&self) -> &str;
async fn call_llm(&self, model: &str, system_prompt: &str,
user_prompt: &str, response_schema: &Value)
-> Result<Value, AppError>;
}
Implementations: GeminiProvider, OpenAiProvider, AnthropicProvider, MockLlmProvider.
The factory (llm/factory.rs) creates provider instances by name. The mock provider enables end-to-end pipeline testing without real API calls.
3.2 Synthesis Pipeline
The pipeline is the core business logic, orchestrated in services/synthesis.rs. It runs as a background tokio task with a 15-minute timeout.
Three phases:
-
Phase 1 -- Personalized Sources: Extract article links from user-curated source pages (windowed, rolling), scrape articles, classify and summarize each via LLM. Batched processing with configurable
batch_size. -
Phase 2 -- Web Search Fallback: For under-filled categories, either call the Brave Search API or use the LLM's web search capability to find additional articles. Scrape and validate results.
-
Save: Assemble sections by category, sanitize JSON, persist to database, record article history traces.
Progress is reported via tokio::sync::watch channels consumed by SSE endpoints.
3.3 Job Store
JobStore (services/job_store.rs) is an in-memory concurrent store for active generation jobs:
- Backed by
DashMap<Uuid, JobEntry>for lock-free access DashSet<Uuid>for per-user deduplication (one active job per user)- Each job holds a
watch::Sender<ProgressEvent>for real-time SSE streaming AtomicBoolfor cooperative cancellation- 1-hour TTL with automatic cleanup
3.4 Scheduler
services/scheduler.rs runs as a background task, checking every minute for due theme_schedules. When a schedule fires:
- Query
find_due_schedulesmatching current day code + time - Skip if user already has a manual generation in progress
- Run
synthesis::run_generation_innerdirectly - Send email to configured recipients (up to 3)
- Mark schedule as run
3.5 Scraper
Two scraping services:
scraper.rs: Article page scraper with SSRF prevention, HTML parsing, title/date/body extraction, soft-404 detection, 15s timeout, 5MB body limit.source_scraper.rs: Source index page scraper that extracts article links from user-configured source URLs (HTML<a>parsing with filters, or LLM-assisted extraction).
3.6 Rate Limiters
- Auth rate limiter: 10 requests/60s per key (email or IP) for magic link endpoints.
- Provider rate limiter: Per-LLM-provider sliding window, admin-configured, hot-reloaded from DB.
- User rate limiters: Per-user generation rate limits cached in
DashMap, recreated on settings change.
4. Data Model
Tables and Relationships
users
├── sessions (user_id FK, CASCADE)
├── magic_tokens (email reference, no FK)
├── settings (user_id PK/FK, CASCADE)
├── themes (user_id FK, CASCADE)
│ ├── sources (theme_id FK, CASCADE)
│ ├── syntheses (theme_id FK, SET NULL)
│ └── theme_schedules (theme_id FK, CASCADE, UNIQUE)
├── user_api_keys (user_id FK, CASCADE; UNIQUE per provider)
├── article_history (user_id FK, CASCADE)
├── llm_call_log (user_id FK, CASCADE)
└── audit_log (admin_user_id FK, SET NULL)
admin_providers
└── admin_rate_limits (provider_name FK, CASCADE)
Table Summary
| Table | Purpose | Key Columns |
|---|---|---|
users |
User accounts | id, email, display_name, role (user/admin), created_at |
sessions |
Login sessions | session_hash (PK), user_id, expires_at, last_active_at, ip_address |
magic_tokens |
Passwordless auth tokens | id, email, token_hash, expires_at, used |
settings |
Per-user pipeline config | user_id (PK), ai_provider, ai_model, ai_model_websearch, batch_size, max_articles_per_source, max_links_per_source, use_brave_search, source_extraction_window, article_history_days, search_agent_behavior, rate_limit_max_requests, rate_limit_time_window_seconds |
themes |
Per-user topic configurations | id, user_id, name, theme, categories (JSONB), max_items_per_category, max_age_days, summary_length |
sources |
User-curated news source URLs | id, user_id, title, url, theme_id, is_preferred |
syntheses |
Generated synthesis results | id, user_id, week, sections (JSONB), status, job_id, theme_id |
theme_schedules |
Automated generation schedules | id, theme_id (UNIQUE), user_id, enabled, days (JSONB), time_utc, emails (JSONB), last_run_at |
article_history |
Article URL dedup + provenance trace | id, user_id, url, url_hash, title, source_type, source_url, category, synthesis_id, status, scraped_ok, job_id, published_date |
llm_call_log |
Full LLM interaction log | id, user_id, job_id, call_type, model, system_prompt, user_prompt, response_body, duration_ms, article_url |
admin_providers |
Admin-curated LLM provider catalog | id, provider_name (UNIQUE), display_name, models_scraping (JSONB), models_websearch (JSONB), is_enabled |
admin_rate_limits |
Per-provider rate limit config | id, provider_name (UNIQUE, FK), max_requests, time_window_seconds |
user_api_keys |
Encrypted user LLM API keys | id, user_id, provider_name, encrypted_key (BYTEA), nonce (BYTEA), key_prefix; UNIQUE(user_id, provider_name) |
audit_log |
Admin mutation audit trail | id, admin_user_id, action, target_type, target_id, details (JSONB) |
5. API Overview
All API routes are prefixed with /api/v1. CSRF protection (X-Requested-With header) is applied to all mutating endpoints.
Authentication
| Method | Path | Auth | Description |
|---|---|---|---|
| POST | /auth/register | Public | Create account + send magic link |
| POST | /auth/login | Public | Request magic link |
| GET | /auth/verify | Public | Verify token (email click redirect) |
| POST | /auth/verify | Public | Verify token (frontend API call) |
| POST | /auth/logout | Authenticated | Destroy session |
| GET | /auth/me | Authenticated | Current user info |
Settings
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | /settings | Authenticated | Get user settings |
| PUT | /settings | Authenticated | Update user settings |
Themes
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | /themes | Authenticated | List user themes |
| POST | /themes | Authenticated | Create theme |
| PUT | /themes/{id} | Authenticated | Update theme |
| DELETE | /themes/{id} | Authenticated | Delete theme |
Schedules
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | /themes/{id}/schedule | Authenticated | Get theme schedule |
| PUT | /themes/{id}/schedule | Authenticated | Create or update schedule |
| DELETE | /themes/{id}/schedule | Authenticated | Delete schedule |
Sources
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | /sources | Authenticated | List sources |
| POST | /sources | Authenticated | Create source |
| PUT | /sources/preferred | Authenticated | Update preferred sources |
| DELETE | /sources/{id} | Authenticated | Delete source |
| POST | /sources/bulk | Authenticated | Bulk import (JSON) |
| POST | /sources/import-csv | Authenticated | Import from CSV |
| GET | /sources/export-csv | Authenticated | Export as CSV |
Syntheses & Generation
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | /syntheses | Authenticated | List syntheses |
| GET | /syntheses/{id} | Authenticated | Get full synthesis |
| DELETE | /syntheses/{id} | Authenticated | Delete synthesis |
| POST | /syntheses/generate | Authenticated | Trigger generation |
| GET | /syntheses/generate/{job_id}/progress | Authenticated | SSE progress stream |
| POST | /syntheses/generate/{job_id}/stop | Authenticated | Cancel generation |
| POST | /syntheses/{id}/send-email | Authenticated | Email synthesis |
| GET | /syntheses/{id}/export/markdown | Authenticated | Markdown download |
| GET | /syntheses/{id}/export/pdf | Authenticated | PDF download |
Article History & LLM Logs
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | /article-history | Authenticated | List article history |
| DELETE | /article-history | Authenticated | Clear article history |
| GET | /syntheses/{id}/provenance | Authenticated | Get synthesis provenance |
| GET | /llm-logs/{job_id} | Authenticated | Get LLM call logs for job |
User API Keys
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | /user/api-keys | Authenticated | List keys (prefix only) |
| POST | /user/api-keys | Authenticated | Store encrypted key |
| DELETE | /user/api-keys/{provider} | Authenticated | Delete key |
| POST | /user/api-keys/{provider}/test | Authenticated | Test key validity |
| POST | /user/api-keys/export | Authenticated | Export keys |
Configuration & Admin
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | /config/providers | Authenticated | Available providers/models |
| GET | /admin/providers | Admin | List all providers |
| POST | /admin/providers | Admin | Create provider |
| PUT | /admin/providers/{id} | Admin | Update provider |
| DELETE | /admin/providers/{id} | Admin | Delete provider |
| GET | /admin/rate-limits | Admin | List rate limits |
| PUT | /admin/rate-limits/{provider_name} | Admin | Update rate limit |
| GET | /admin/users | Admin | List users |
| PUT | /admin/users/{id}/role | Admin | Change user role |
Infrastructure
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | /health | Public | Health check |
6. Security Architecture
Authentication & Session Management
- Passwordless: Magic link tokens sent via email (Resend API), single-use, time-limited
- Captcha: Cloudflare Turnstile on registration and login
- Sessions: SHA-256 hashed tokens stored in DB, 30-day expiry,
HttpOnly+SameSite=Laxcookies, optionallySecure - Anti-enumeration: Same response for existent/non-existent emails, timing attack mitigation
- Authorization:
AuthUserandAdminUserAxum extractors enforce auth levels per handler
CSRF Protection
All mutating API endpoints require the X-Requested-With header (checked by csrf::csrf_check middleware layer). Non-mutating GET/HEAD/OPTIONS requests are exempt.
Encryption at Rest
User LLM API keys are encrypted with AES-256-GCM before storage:
- 32-byte master key from
MASTER_ENCRYPTION_KEYenv var (64 hex chars) - Random 12-byte nonce per encryption (stored alongside ciphertext)
- Key bytes are zeroized on drop (
zeroizecrate) - Only a key prefix (first 8 chars + "...") is ever returned via the API
SSRF Prevention
Both scraper.rs and source_scraper.rs validate URLs before fetching:
- DNS resolution check against private/loopback IP ranges
- Redirect chain validation (no redirects to private IPs)
- Only HTTP/HTTPS schemes allowed
Security Headers
Applied as global middleware layers:
Content-Security-Policy(self + Cloudflare Turnstile)X-Content-Type-Options: nosniffX-Frame-Options: DENYReferrer-Policy: strict-origin-when-cross-originX-XSS-Protection: 1; mode=blockStrict-Transport-Security(HTTPS only)
Error Sanitization
The sanitize_error_message function strips API keys and internal details from error messages before they reach SSE clients. Internal errors log full details server-side but return generic messages to users.
CORS
Configured to allow only the APP_URL origin, with credentials (cookies), limited to GET/POST/PUT/DELETE methods.
7. Concurrency Model
Async Runtime
Tokio with full features. The Axum server runs as a multi-threaded async runtime.
Background Tasks
Spawned at startup via tokio::spawn:
- Session cleanup: Hourly deletion of expired DB sessions
- Job store cleanup: Periodic removal of expired job entries (1-hour TTL)
- Scheduler: Minute-by-minute check for due theme schedules
Generation Pipeline Concurrency
tokio::task::JoinSet: Used for parallel scraping (bounded concurrency of 5 for source extraction) and parallel LLM classification calls within each batchtokio::sync::watch: Fan-out progress notifications to SSE clients; late subscribers immediately receive the latest stateAtomicBool: Cooperative cancellation flag checked between pipeline stages; avoids mutex overheadDashMap/DashSet: Lock-free concurrent access for the job store (job entries), generating-users set, per-user rate limiter cache, and provider rate limiter state
Task Lifecycle
POST /generate
└── handler creates job in JobStore
└── spawns outer task (panic monitor)
└── spawns inner task (15-min timeout)
└── run_generation_inner()
├── Phase 1 (JoinSet scrape, JoinSet classify)
├── Phase 2 (JoinSet scrape, JoinSet classify)
└── Save to DB
└── on complete/error: send final ProgressEvent
└── delayed cleanup (5 min) then remove from JobStore
Graceful Shutdown
The server supports graceful shutdown via signal handling, allowing in-flight requests to complete.