docs: add consolidated architecture.md and technical_specs.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>master
parent
d3a4d2c577
commit
f07e91ba11
@ -0,0 +1,382 @@
|
|||||||
|
# AI Weekly Synth -- Architecture Document
|
||||||
|
|
||||||
|
## 1. System Overview
|
||||||
|
|
||||||
|
AI Weekly Synth is a self-hosted web application that generates AI-powered weekly news syntheses. Users configure topics (themes), categories, and an LLM provider; the system then searches the web, scrapes and validates sources, classifies articles, and produces structured summaries.
|
||||||
|
|
||||||
|
### Technology Stack
|
||||||
|
|
||||||
|
| Layer | Technology |
|
||||||
|
|---|---|
|
||||||
|
| Backend | Rust (Axum 0.8) |
|
||||||
|
| Frontend | SolidJS 1.9 + Tailwind CSS v4 |
|
||||||
|
| Database | PostgreSQL 17 (via sqlx with compile-time query checking) |
|
||||||
|
| Deployment | Docker Compose (app + Postgres) |
|
||||||
|
|
||||||
|
### Deployment Topology
|
||||||
|
|
||||||
|
```
|
||||||
|
docker-compose.yml
|
||||||
|
├── app (ai-synth) port 8080
|
||||||
|
│ ├── Axum HTTP server
|
||||||
|
│ ├── Static file serving (SPA fallback)
|
||||||
|
│ └── Background tasks (scheduler, session cleanup, job TTL)
|
||||||
|
└── db (postgres:17-alpine) port 5432 (localhost only)
|
||||||
|
└── postgres_data volume
|
||||||
|
```
|
||||||
|
|
||||||
|
The app container builds from a multi-stage Dockerfile, serves the SolidJS frontend as static files, and connects to Postgres over the `internal` bridge network.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Layer Architecture
|
||||||
|
|
||||||
|
The backend follows a three-layer architecture with shared model types:
|
||||||
|
|
||||||
|
```
|
||||||
|
handlers/ (HTTP layer)
|
||||||
|
│
|
||||||
|
├── extracts request data (Axum extractors, JSON, path params)
|
||||||
|
├── validates input
|
||||||
|
├── calls services/ or db/ directly
|
||||||
|
└── formats HTTP responses
|
||||||
|
│
|
||||||
|
services/ (Business logic)
|
||||||
|
│
|
||||||
|
├── synthesis pipeline orchestration
|
||||||
|
├── LLM provider abstraction + factory
|
||||||
|
├── scraping (articles, source pages)
|
||||||
|
├── encryption, email, CSV, PDF export
|
||||||
|
├── rate limiting, job store, scheduler
|
||||||
|
└── Brave Search client
|
||||||
|
│
|
||||||
|
db/ (Data access)
|
||||||
|
│
|
||||||
|
├── pure SQL queries via sqlx
|
||||||
|
├── typed result mapping (FromRow)
|
||||||
|
└── no business logic
|
||||||
|
│
|
||||||
|
models/ (Shared types -- used by all layers)
|
||||||
|
│
|
||||||
|
├── domain structs (User, Theme, Source, Synthesis, etc.)
|
||||||
|
├── request/response DTOs
|
||||||
|
└── validation logic
|
||||||
|
```
|
||||||
|
|
||||||
|
### Module Inventory
|
||||||
|
|
||||||
|
**Handlers** (`handlers/`): `admin`, `api_keys`, `article_history`, `auth`, `config`, `generation`, `health`, `llm_logs`, `schedules`, `settings`, `sources`, `syntheses`, `themes`
|
||||||
|
|
||||||
|
**Services** (`services/`): `auth`, `brave_search`, `csv`, `email`, `encryption`, `export`, `job_store`, `llm` (with `gemini`, `openai`, `anthropic`, `mock`, `factory`, `schema`), `prompts`, `rate_limiter`, `scheduler`, `scraper`, `source_scraper`, `synthesis`, `turnstile`
|
||||||
|
|
||||||
|
**DB** (`db/`): `api_keys`, `article_history`, `audit`, `llm_call_log`, `magic_links`, `providers`, `rate_limits`, `schedules`, `sessions`, `settings`, `sources`, `syntheses`, `themes`, `users`
|
||||||
|
|
||||||
|
**Models** (`models/`): `api_key`, `audit`, `magic_link`, `provider`, `rate_limit`, `schedule`, `session`, `settings`, `source`, `synthesis`, `theme`, `user`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Key Components
|
||||||
|
|
||||||
|
### 3.1 LLM Provider Abstraction
|
||||||
|
|
||||||
|
The `LlmProvider` trait defines a unified interface for all LLM backends:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
#[async_trait]
|
||||||
|
pub trait LlmProvider: Send + Sync {
|
||||||
|
fn provider_id(&self) -> &str;
|
||||||
|
async fn call_llm(&self, model: &str, system_prompt: &str,
|
||||||
|
user_prompt: &str, response_schema: &Value)
|
||||||
|
-> Result<Value, AppError>;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Implementations: `GeminiProvider`, `OpenAiProvider`, `AnthropicProvider`, `MockLlmProvider`.
|
||||||
|
|
||||||
|
The factory (`llm/factory.rs`) creates provider instances by name. The mock provider enables end-to-end pipeline testing without real API calls.
|
||||||
|
|
||||||
|
### 3.2 Synthesis Pipeline
|
||||||
|
|
||||||
|
The pipeline is the core business logic, orchestrated in `services/synthesis.rs`. It runs as a background tokio task with a 15-minute timeout.
|
||||||
|
|
||||||
|
**Three phases:**
|
||||||
|
|
||||||
|
1. **Phase 1 -- Personalized Sources**: Extract article links from user-curated source pages (windowed, rolling), scrape articles, classify and summarize each via LLM. Batched processing with configurable `batch_size`.
|
||||||
|
|
||||||
|
2. **Phase 2 -- Web Search Fallback**: For under-filled categories, either call the Brave Search API or use the LLM's web search capability to find additional articles. Scrape and validate results.
|
||||||
|
|
||||||
|
3. **Save**: Assemble sections by category, sanitize JSON, persist to database, record article history traces.
|
||||||
|
|
||||||
|
Progress is reported via `tokio::sync::watch` channels consumed by SSE endpoints.
|
||||||
|
|
||||||
|
### 3.3 Job Store
|
||||||
|
|
||||||
|
`JobStore` (`services/job_store.rs`) is an in-memory concurrent store for active generation jobs:
|
||||||
|
|
||||||
|
- Backed by `DashMap<Uuid, JobEntry>` for lock-free access
|
||||||
|
- `DashSet<Uuid>` for per-user deduplication (one active job per user)
|
||||||
|
- Each job holds a `watch::Sender<ProgressEvent>` for real-time SSE streaming
|
||||||
|
- `AtomicBool` for cooperative cancellation
|
||||||
|
- 1-hour TTL with automatic cleanup
|
||||||
|
|
||||||
|
### 3.4 Scheduler
|
||||||
|
|
||||||
|
`services/scheduler.rs` runs as a background task, checking every minute for due `theme_schedules`. When a schedule fires:
|
||||||
|
|
||||||
|
1. Query `find_due_schedules` matching current day code + time
|
||||||
|
2. Skip if user already has a manual generation in progress
|
||||||
|
3. Run `synthesis::run_generation_inner` directly
|
||||||
|
4. Send email to configured recipients (up to 3)
|
||||||
|
5. Mark schedule as run
|
||||||
|
|
||||||
|
### 3.5 Scraper
|
||||||
|
|
||||||
|
Two scraping services:
|
||||||
|
|
||||||
|
- **`scraper.rs`**: Article page scraper with SSRF prevention, HTML parsing, title/date/body extraction, soft-404 detection, 15s timeout, 5MB body limit.
|
||||||
|
- **`source_scraper.rs`**: Source index page scraper that extracts article links from user-configured source URLs (HTML `<a>` parsing with filters, or LLM-assisted extraction).
|
||||||
|
|
||||||
|
### 3.6 Rate Limiters
|
||||||
|
|
||||||
|
- **Auth rate limiter**: 10 requests/60s per key (email or IP) for magic link endpoints.
|
||||||
|
- **Provider rate limiter**: Per-LLM-provider sliding window, admin-configured, hot-reloaded from DB.
|
||||||
|
- **User rate limiters**: Per-user generation rate limits cached in `DashMap`, recreated on settings change.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Data Model
|
||||||
|
|
||||||
|
### Tables and Relationships
|
||||||
|
|
||||||
|
```
|
||||||
|
users
|
||||||
|
├── sessions (user_id FK, CASCADE)
|
||||||
|
├── magic_tokens (email reference, no FK)
|
||||||
|
├── settings (user_id PK/FK, CASCADE)
|
||||||
|
├── themes (user_id FK, CASCADE)
|
||||||
|
│ ├── sources (theme_id FK, CASCADE)
|
||||||
|
│ ├── syntheses (theme_id FK, SET NULL)
|
||||||
|
│ └── theme_schedules (theme_id FK, CASCADE, UNIQUE)
|
||||||
|
├── user_api_keys (user_id FK, CASCADE; UNIQUE per provider)
|
||||||
|
├── article_history (user_id FK, CASCADE)
|
||||||
|
├── llm_call_log (user_id FK, CASCADE)
|
||||||
|
└── audit_log (admin_user_id FK, SET NULL)
|
||||||
|
|
||||||
|
admin_providers
|
||||||
|
└── admin_rate_limits (provider_name FK, CASCADE)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Table Summary
|
||||||
|
|
||||||
|
| Table | Purpose | Key Columns |
|
||||||
|
|---|---|---|
|
||||||
|
| `users` | User accounts | id, email, display_name, role (user/admin), created_at |
|
||||||
|
| `sessions` | Login sessions | session_hash (PK), user_id, expires_at, last_active_at, ip_address |
|
||||||
|
| `magic_tokens` | Passwordless auth tokens | id, email, token_hash, expires_at, used |
|
||||||
|
| `settings` | Per-user pipeline config | user_id (PK), ai_provider, ai_model, ai_model_websearch, batch_size, max_articles_per_source, max_links_per_source, use_brave_search, source_extraction_window, article_history_days, search_agent_behavior, rate_limit_max_requests, rate_limit_time_window_seconds |
|
||||||
|
| `themes` | Per-user topic configurations | id, user_id, name, theme, categories (JSONB), max_items_per_category, max_age_days, summary_length |
|
||||||
|
| `sources` | User-curated news source URLs | id, user_id, title, url, theme_id, is_preferred |
|
||||||
|
| `syntheses` | Generated synthesis results | id, user_id, week, sections (JSONB), status, job_id, theme_id |
|
||||||
|
| `theme_schedules` | Automated generation schedules | id, theme_id (UNIQUE), user_id, enabled, days (JSONB), time_utc, emails (JSONB), last_run_at |
|
||||||
|
| `article_history` | Article URL dedup + provenance trace | id, user_id, url, url_hash, title, source_type, source_url, category, synthesis_id, status, scraped_ok, job_id, published_date |
|
||||||
|
| `llm_call_log` | Full LLM interaction log | id, user_id, job_id, call_type, model, system_prompt, user_prompt, response_body, duration_ms, article_url |
|
||||||
|
| `admin_providers` | Admin-curated LLM provider catalog | id, provider_name (UNIQUE), display_name, models_scraping (JSONB), models_websearch (JSONB), is_enabled |
|
||||||
|
| `admin_rate_limits` | Per-provider rate limit config | id, provider_name (UNIQUE, FK), max_requests, time_window_seconds |
|
||||||
|
| `user_api_keys` | Encrypted user LLM API keys | id, user_id, provider_name, encrypted_key (BYTEA), nonce (BYTEA), key_prefix; UNIQUE(user_id, provider_name) |
|
||||||
|
| `audit_log` | Admin mutation audit trail | id, admin_user_id, action, target_type, target_id, details (JSONB) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. API Overview
|
||||||
|
|
||||||
|
All API routes are prefixed with `/api/v1`. CSRF protection (`X-Requested-With` header) is applied to all mutating endpoints.
|
||||||
|
|
||||||
|
### Authentication
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| POST | /auth/register | Public | Create account + send magic link |
|
||||||
|
| POST | /auth/login | Public | Request magic link |
|
||||||
|
| GET | /auth/verify | Public | Verify token (email click redirect) |
|
||||||
|
| POST | /auth/verify | Public | Verify token (frontend API call) |
|
||||||
|
| POST | /auth/logout | Authenticated | Destroy session |
|
||||||
|
| GET | /auth/me | Authenticated | Current user info |
|
||||||
|
|
||||||
|
### Settings
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| GET | /settings | Authenticated | Get user settings |
|
||||||
|
| PUT | /settings | Authenticated | Update user settings |
|
||||||
|
|
||||||
|
### Themes
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| GET | /themes | Authenticated | List user themes |
|
||||||
|
| POST | /themes | Authenticated | Create theme |
|
||||||
|
| PUT | /themes/{id} | Authenticated | Update theme |
|
||||||
|
| DELETE | /themes/{id} | Authenticated | Delete theme |
|
||||||
|
|
||||||
|
### Schedules
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| GET | /themes/{id}/schedule | Authenticated | Get theme schedule |
|
||||||
|
| PUT | /themes/{id}/schedule | Authenticated | Create or update schedule |
|
||||||
|
| DELETE | /themes/{id}/schedule | Authenticated | Delete schedule |
|
||||||
|
|
||||||
|
### Sources
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| GET | /sources | Authenticated | List sources |
|
||||||
|
| POST | /sources | Authenticated | Create source |
|
||||||
|
| PUT | /sources/preferred | Authenticated | Update preferred sources |
|
||||||
|
| DELETE | /sources/{id} | Authenticated | Delete source |
|
||||||
|
| POST | /sources/bulk | Authenticated | Bulk import (JSON) |
|
||||||
|
| POST | /sources/import-csv | Authenticated | Import from CSV |
|
||||||
|
| GET | /sources/export-csv | Authenticated | Export as CSV |
|
||||||
|
|
||||||
|
### Syntheses & Generation
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| GET | /syntheses | Authenticated | List syntheses |
|
||||||
|
| GET | /syntheses/{id} | Authenticated | Get full synthesis |
|
||||||
|
| DELETE | /syntheses/{id} | Authenticated | Delete synthesis |
|
||||||
|
| POST | /syntheses/generate | Authenticated | Trigger generation |
|
||||||
|
| GET | /syntheses/generate/{job_id}/progress | Authenticated | SSE progress stream |
|
||||||
|
| POST | /syntheses/generate/{job_id}/stop | Authenticated | Cancel generation |
|
||||||
|
| POST | /syntheses/{id}/send-email | Authenticated | Email synthesis |
|
||||||
|
| GET | /syntheses/{id}/export/markdown | Authenticated | Markdown download |
|
||||||
|
| GET | /syntheses/{id}/export/pdf | Authenticated | PDF download |
|
||||||
|
|
||||||
|
### Article History & LLM Logs
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| GET | /article-history | Authenticated | List article history |
|
||||||
|
| DELETE | /article-history | Authenticated | Clear article history |
|
||||||
|
| GET | /syntheses/{id}/provenance | Authenticated | Get synthesis provenance |
|
||||||
|
| GET | /llm-logs/{job_id} | Authenticated | Get LLM call logs for job |
|
||||||
|
|
||||||
|
### User API Keys
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| GET | /user/api-keys | Authenticated | List keys (prefix only) |
|
||||||
|
| POST | /user/api-keys | Authenticated | Store encrypted key |
|
||||||
|
| DELETE | /user/api-keys/{provider} | Authenticated | Delete key |
|
||||||
|
| POST | /user/api-keys/{provider}/test | Authenticated | Test key validity |
|
||||||
|
| POST | /user/api-keys/export | Authenticated | Export keys |
|
||||||
|
|
||||||
|
### Configuration & Admin
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| GET | /config/providers | Authenticated | Available providers/models |
|
||||||
|
| GET | /admin/providers | Admin | List all providers |
|
||||||
|
| POST | /admin/providers | Admin | Create provider |
|
||||||
|
| PUT | /admin/providers/{id} | Admin | Update provider |
|
||||||
|
| DELETE | /admin/providers/{id} | Admin | Delete provider |
|
||||||
|
| GET | /admin/rate-limits | Admin | List rate limits |
|
||||||
|
| PUT | /admin/rate-limits/{provider_name} | Admin | Update rate limit |
|
||||||
|
| GET | /admin/users | Admin | List users |
|
||||||
|
| PUT | /admin/users/{id}/role | Admin | Change user role |
|
||||||
|
|
||||||
|
### Infrastructure
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| GET | /health | Public | Health check |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Security Architecture
|
||||||
|
|
||||||
|
### Authentication & Session Management
|
||||||
|
|
||||||
|
- **Passwordless**: Magic link tokens sent via email (Resend API), single-use, time-limited
|
||||||
|
- **Captcha**: Cloudflare Turnstile on registration and login
|
||||||
|
- **Sessions**: SHA-256 hashed tokens stored in DB, 30-day expiry, `HttpOnly` + `SameSite=Lax` cookies, optionally `Secure`
|
||||||
|
- **Anti-enumeration**: Same response for existent/non-existent emails, timing attack mitigation
|
||||||
|
- **Authorization**: `AuthUser` and `AdminUser` Axum extractors enforce auth levels per handler
|
||||||
|
|
||||||
|
### CSRF Protection
|
||||||
|
|
||||||
|
All mutating API endpoints require the `X-Requested-With` header (checked by `csrf::csrf_check` middleware layer). Non-mutating GET/HEAD/OPTIONS requests are exempt.
|
||||||
|
|
||||||
|
### Encryption at Rest
|
||||||
|
|
||||||
|
User LLM API keys are encrypted with AES-256-GCM before storage:
|
||||||
|
- 32-byte master key from `MASTER_ENCRYPTION_KEY` env var (64 hex chars)
|
||||||
|
- Random 12-byte nonce per encryption (stored alongside ciphertext)
|
||||||
|
- Key bytes are zeroized on drop (`zeroize` crate)
|
||||||
|
- Only a key prefix (first 8 chars + "...") is ever returned via the API
|
||||||
|
|
||||||
|
### SSRF Prevention
|
||||||
|
|
||||||
|
Both `scraper.rs` and `source_scraper.rs` validate URLs before fetching:
|
||||||
|
- DNS resolution check against private/loopback IP ranges
|
||||||
|
- Redirect chain validation (no redirects to private IPs)
|
||||||
|
- Only HTTP/HTTPS schemes allowed
|
||||||
|
|
||||||
|
### Security Headers
|
||||||
|
|
||||||
|
Applied as global middleware layers:
|
||||||
|
- `Content-Security-Policy` (self + Cloudflare Turnstile)
|
||||||
|
- `X-Content-Type-Options: nosniff`
|
||||||
|
- `X-Frame-Options: DENY`
|
||||||
|
- `Referrer-Policy: strict-origin-when-cross-origin`
|
||||||
|
- `X-XSS-Protection: 1; mode=block`
|
||||||
|
- `Strict-Transport-Security` (HTTPS only)
|
||||||
|
|
||||||
|
### Error Sanitization
|
||||||
|
|
||||||
|
The `sanitize_error_message` function strips API keys and internal details from error messages before they reach SSE clients. Internal errors log full details server-side but return generic messages to users.
|
||||||
|
|
||||||
|
### CORS
|
||||||
|
|
||||||
|
Configured to allow only the `APP_URL` origin, with credentials (cookies), limited to GET/POST/PUT/DELETE methods.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Concurrency Model
|
||||||
|
|
||||||
|
### Async Runtime
|
||||||
|
|
||||||
|
Tokio with full features. The Axum server runs as a multi-threaded async runtime.
|
||||||
|
|
||||||
|
### Background Tasks
|
||||||
|
|
||||||
|
Spawned at startup via `tokio::spawn`:
|
||||||
|
- **Session cleanup**: Hourly deletion of expired DB sessions
|
||||||
|
- **Job store cleanup**: Periodic removal of expired job entries (1-hour TTL)
|
||||||
|
- **Scheduler**: Minute-by-minute check for due theme schedules
|
||||||
|
|
||||||
|
### Generation Pipeline Concurrency
|
||||||
|
|
||||||
|
- **`tokio::task::JoinSet`**: Used for parallel scraping (bounded concurrency of 5 for source extraction) and parallel LLM classification calls within each batch
|
||||||
|
- **`tokio::sync::watch`**: Fan-out progress notifications to SSE clients; late subscribers immediately receive the latest state
|
||||||
|
- **`AtomicBool`**: Cooperative cancellation flag checked between pipeline stages; avoids mutex overhead
|
||||||
|
- **`DashMap` / `DashSet`**: Lock-free concurrent access for the job store (job entries), generating-users set, per-user rate limiter cache, and provider rate limiter state
|
||||||
|
|
||||||
|
### Task Lifecycle
|
||||||
|
|
||||||
|
```
|
||||||
|
POST /generate
|
||||||
|
└── handler creates job in JobStore
|
||||||
|
└── spawns outer task (panic monitor)
|
||||||
|
└── spawns inner task (15-min timeout)
|
||||||
|
└── run_generation_inner()
|
||||||
|
├── Phase 1 (JoinSet scrape, JoinSet classify)
|
||||||
|
├── Phase 2 (JoinSet scrape, JoinSet classify)
|
||||||
|
└── Save to DB
|
||||||
|
└── on complete/error: send final ProgressEvent
|
||||||
|
└── delayed cleanup (5 min) then remove from JobStore
|
||||||
|
```
|
||||||
|
|
||||||
|
### Graceful Shutdown
|
||||||
|
|
||||||
|
The server supports graceful shutdown via signal handling, allowing in-flight requests to complete.
|
||||||
@ -0,0 +1,793 @@
|
|||||||
|
# AI Weekly Synth -- Technical Specifications
|
||||||
|
|
||||||
|
## 1. Backend Tech Stack
|
||||||
|
|
||||||
|
| Dependency | Version | Purpose |
|
||||||
|
|---|---|---|
|
||||||
|
| axum | 0.8 | Web framework (macros, multipart) |
|
||||||
|
| tokio | 1 | Async runtime (full features) |
|
||||||
|
| tower | 0.5 | Middleware composition |
|
||||||
|
| tower-http | 0.6 | CORS, static files, tracing, headers |
|
||||||
|
| sqlx | 0.8 | Async Postgres driver (runtime-tokio, tls-rustls, uuid, chrono, json, migrate) |
|
||||||
|
| reqwest | 0.12 | HTTP client (JSON) |
|
||||||
|
| serde / serde_json | 1 | Serialization/deserialization |
|
||||||
|
| chrono | 0.4 | Date/time handling (serde feature) |
|
||||||
|
| aes-gcm | 0.10 | AES-256-GCM encryption |
|
||||||
|
| zeroize | 1 | Secure memory zeroing |
|
||||||
|
| sha2 | 0.10 | SHA-256 hashing |
|
||||||
|
| rand | 0.8 | Random number generation |
|
||||||
|
| base64 | 0.22 | Base64 encoding |
|
||||||
|
| hex | 0.4 | Hex encoding/decoding |
|
||||||
|
| async-trait | 0.1 | Async trait objects |
|
||||||
|
| tracing / tracing-subscriber | 0.1 / 0.3 | Structured logging (env-filter, json) |
|
||||||
|
| dotenvy | 0.15 | .env file loading |
|
||||||
|
| clap | 4 | CLI argument parsing |
|
||||||
|
| scraper | 0.22 | HTML parsing (CSS selectors) |
|
||||||
|
| ego-tree | 0.10 | Tree data structure (used by scraper) |
|
||||||
|
| url | 2 | URL parsing and validation |
|
||||||
|
| email_address | 0.2 | Email validation |
|
||||||
|
| anyhow | 1 | Error context |
|
||||||
|
| thiserror | 2 | Error type derivation |
|
||||||
|
| uuid | 1 | UUID v4 generation (serde feature) |
|
||||||
|
| dashmap | 6 | Concurrent hash maps |
|
||||||
|
| tokio-stream | 0.1 | Stream utilities for SSE |
|
||||||
|
| futures | 0.3 | Async stream combinators |
|
||||||
|
| printpdf | 0.7 | PDF generation |
|
||||||
|
|
||||||
|
**Dev dependencies**: tower (util), http-body-util, wiremock 0.6.
|
||||||
|
|
||||||
|
**Rust edition**: 2021.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Frontend Tech Stack
|
||||||
|
|
||||||
|
| Dependency | Version | Purpose |
|
||||||
|
|---|---|---|
|
||||||
|
| solid-js | ^1.9.0 | Reactive UI framework |
|
||||||
|
| @solidjs/router | ^0.15.0 | Client-side routing |
|
||||||
|
| lucide-solid | ^0.475.0 | Icon library |
|
||||||
|
| date-fns | ^4.1.0 | Date formatting |
|
||||||
|
| tailwindcss | ^4.1.0 | Utility-first CSS (v4) |
|
||||||
|
| @tailwindcss/vite | ^4.1.0 | Tailwind Vite plugin |
|
||||||
|
| vite | ^6.2.0 | Build tool and dev server |
|
||||||
|
| vite-plugin-solid | ^2.11.0 | SolidJS Vite integration |
|
||||||
|
| typescript | ~5.8.0 | Type checking |
|
||||||
|
| vitest | ^3.0.0 | Unit testing |
|
||||||
|
| @solidjs/testing-library | ^0.8.0 | Component testing |
|
||||||
|
| jsdom | ^25.0.0 | DOM environment for tests |
|
||||||
|
|
||||||
|
### Frontend Routes
|
||||||
|
|
||||||
|
| Path | Component | Auth | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| /login | Login | Public | Login page |
|
||||||
|
| /register | Register | Public | Registration page |
|
||||||
|
| /auth/verify | AuthVerify | Public | Magic link verification |
|
||||||
|
| / | Home | Protected | Dashboard / synthesis list |
|
||||||
|
| /settings | Settings | Protected | User settings |
|
||||||
|
| /themes | ThemeManager | Protected | Theme CRUD + source management |
|
||||||
|
| /generate | GenerateSynthesis | Protected | Generation trigger + progress |
|
||||||
|
| /synthesis/:id | SynthesisDetail | Protected | Full synthesis view |
|
||||||
|
| /article-history | ArticleHistory | Protected | Article history browser |
|
||||||
|
| /llm-logs/:jobId | LlmLogs | Protected | LLM call log viewer |
|
||||||
|
| /admin/providers | AdminProviders | Admin | Provider configuration |
|
||||||
|
| /admin/rate-limits | AdminRateLimits | Admin | Rate limit configuration |
|
||||||
|
| /admin/users | AdminUsers | Admin | User management |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Database Schema
|
||||||
|
|
||||||
|
### 3.1 `users`
|
||||||
|
|
||||||
|
| Column | Type | Constraints |
|
||||||
|
|---|---|---|
|
||||||
|
| id | UUID | PK, DEFAULT gen_random_uuid() |
|
||||||
|
| email | TEXT | NOT NULL, UNIQUE |
|
||||||
|
| display_name | TEXT | nullable |
|
||||||
|
| role | TEXT | NOT NULL, DEFAULT 'user', CHECK (user/admin) |
|
||||||
|
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
|
||||||
|
| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
|
||||||
|
|
||||||
|
Indexes: `idx_users_email` on (email).
|
||||||
|
|
||||||
|
### 3.2 `sessions`
|
||||||
|
|
||||||
|
| Column | Type | Constraints |
|
||||||
|
|---|---|---|
|
||||||
|
| session_hash | TEXT | PK (SHA-256 of raw token) |
|
||||||
|
| user_id | UUID | NOT NULL, FK users(id) CASCADE |
|
||||||
|
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
|
||||||
|
| expires_at | TIMESTAMPTZ | NOT NULL |
|
||||||
|
| last_active_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
|
||||||
|
| ip_address | TEXT | nullable |
|
||||||
|
| user_agent | TEXT | nullable |
|
||||||
|
|
||||||
|
Indexes: `idx_sessions_user_id`, `idx_sessions_expires_at`.
|
||||||
|
|
||||||
|
### 3.3 `magic_tokens`
|
||||||
|
|
||||||
|
| Column | Type | Constraints |
|
||||||
|
|---|---|---|
|
||||||
|
| id | UUID | PK, DEFAULT gen_random_uuid() |
|
||||||
|
| email | TEXT | NOT NULL |
|
||||||
|
| token_hash | TEXT | NOT NULL, UNIQUE |
|
||||||
|
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
|
||||||
|
| expires_at | TIMESTAMPTZ | NOT NULL |
|
||||||
|
| used | BOOLEAN | NOT NULL, DEFAULT false |
|
||||||
|
|
||||||
|
Indexes: `idx_magic_tokens_email`, `idx_magic_tokens_expires`.
|
||||||
|
|
||||||
|
### 3.4 `settings`
|
||||||
|
|
||||||
|
Per-user pipeline configuration. One row per user (user_id is the PK).
|
||||||
|
|
||||||
|
| Column | Type | Constraints |
|
||||||
|
|---|---|---|
|
||||||
|
| user_id | UUID | PK, FK users(id) CASCADE |
|
||||||
|
| max_articles_per_source | INTEGER | NOT NULL, DEFAULT 3 |
|
||||||
|
| max_links_per_source | INTEGER | NOT NULL, DEFAULT 8 |
|
||||||
|
| use_brave_search | BOOLEAN | NOT NULL, DEFAULT false |
|
||||||
|
| article_history_days | INTEGER | NOT NULL, DEFAULT 90 |
|
||||||
|
| batch_size | INTEGER | NOT NULL, DEFAULT 5 |
|
||||||
|
| source_extraction_window | INTEGER | NOT NULL, DEFAULT 3 |
|
||||||
|
| search_agent_behavior | TEXT | NOT NULL, DEFAULT '' |
|
||||||
|
| ai_provider | TEXT | NOT NULL, DEFAULT '' |
|
||||||
|
| ai_model | TEXT | NOT NULL, DEFAULT '' |
|
||||||
|
| ai_model_websearch | TEXT | NOT NULL, DEFAULT '' |
|
||||||
|
| rate_limit_max_requests | INTEGER | nullable |
|
||||||
|
| rate_limit_time_window_seconds | INTEGER | nullable |
|
||||||
|
| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
|
||||||
|
|
||||||
|
### 3.5 `themes`
|
||||||
|
|
||||||
|
Per-user topic configurations with content settings.
|
||||||
|
|
||||||
|
| Column | Type | Constraints |
|
||||||
|
|---|---|---|
|
||||||
|
| id | UUID | PK, DEFAULT gen_random_uuid() |
|
||||||
|
| user_id | UUID | NOT NULL, FK users(id) CASCADE |
|
||||||
|
| name | TEXT | NOT NULL |
|
||||||
|
| theme | TEXT | NOT NULL (search topic) |
|
||||||
|
| categories | JSONB | NOT NULL, DEFAULT '[]' |
|
||||||
|
| max_items_per_category | INTEGER | NOT NULL, DEFAULT 4 |
|
||||||
|
| max_age_days | INTEGER | NOT NULL, DEFAULT 7 |
|
||||||
|
| summary_length | INTEGER | NOT NULL, DEFAULT 3 |
|
||||||
|
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
|
||||||
|
| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
|
||||||
|
|
||||||
|
Indexes: `idx_themes_user_id`.
|
||||||
|
|
||||||
|
### 3.6 `sources`
|
||||||
|
|
||||||
|
User-curated news source URLs, optionally tied to a theme.
|
||||||
|
|
||||||
|
| Column | Type | Constraints |
|
||||||
|
|---|---|---|
|
||||||
|
| id | UUID | PK, DEFAULT gen_random_uuid() |
|
||||||
|
| user_id | UUID | NOT NULL, FK users(id) CASCADE |
|
||||||
|
| title | VARCHAR(200) | NOT NULL, CHECK length 1-200 |
|
||||||
|
| url | VARCHAR(1000) | NOT NULL, CHECK length <= 1000 |
|
||||||
|
| theme_id | UUID | nullable, FK themes(id) CASCADE |
|
||||||
|
| is_preferred | BOOLEAN | NOT NULL, DEFAULT false |
|
||||||
|
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
|
||||||
|
|
||||||
|
Indexes: `idx_sources_user_id`, UNIQUE `idx_sources_user_id_url` on (user_id, url).
|
||||||
|
|
||||||
|
### 3.7 `syntheses`
|
||||||
|
|
||||||
|
Generated synthesis results with JSONB section data.
|
||||||
|
|
||||||
|
| Column | Type | Constraints |
|
||||||
|
|---|---|---|
|
||||||
|
| id | UUID | PK, DEFAULT gen_random_uuid() |
|
||||||
|
| user_id | UUID | NOT NULL, FK users(id) CASCADE |
|
||||||
|
| week | VARCHAR(10) | NOT NULL (ISO week string) |
|
||||||
|
| sections | JSONB | NOT NULL, DEFAULT '[]' |
|
||||||
|
| status | VARCHAR(20) | NOT NULL, DEFAULT 'completed' |
|
||||||
|
| job_id | UUID | nullable |
|
||||||
|
| theme_id | UUID | nullable, FK themes(id) SET NULL |
|
||||||
|
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
|
||||||
|
|
||||||
|
Indexes: `idx_syntheses_user_id_created_at` on (user_id, created_at DESC).
|
||||||
|
|
||||||
|
JSONB structure for `sections`:
|
||||||
|
```json
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"title": "Category Name",
|
||||||
|
"items": [
|
||||||
|
{ "title": "Article Title", "url": "https://...", "summary": "...", "date": "2026-03-25" }
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3.8 `theme_schedules`
|
||||||
|
|
||||||
|
Automated generation schedules, one per theme.
|
||||||
|
|
||||||
|
| Column | Type | Constraints |
|
||||||
|
|---|---|---|
|
||||||
|
| id | UUID | PK, DEFAULT gen_random_uuid() |
|
||||||
|
| theme_id | UUID | NOT NULL, UNIQUE, FK themes(id) CASCADE |
|
||||||
|
| user_id | UUID | NOT NULL, FK users(id) CASCADE |
|
||||||
|
| enabled | BOOLEAN | NOT NULL, DEFAULT true |
|
||||||
|
| days | JSONB | NOT NULL, DEFAULT '[]' (e.g. ["mon","fri"]) |
|
||||||
|
| time_utc | TEXT | NOT NULL, DEFAULT '08:00' (HH:MM) |
|
||||||
|
| emails | JSONB | NOT NULL, DEFAULT '[]' (up to 3 addresses) |
|
||||||
|
| last_run_at | TIMESTAMPTZ | nullable |
|
||||||
|
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
|
||||||
|
| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
|
||||||
|
|
||||||
|
Indexes: `idx_theme_schedules_enabled` (partial, WHERE enabled = true).
|
||||||
|
|
||||||
|
### 3.9 `article_history`
|
||||||
|
|
||||||
|
Article URL deduplication and full provenance tracing.
|
||||||
|
|
||||||
|
| Column | Type | Constraints |
|
||||||
|
|---|---|---|
|
||||||
|
| id | UUID | PK, DEFAULT gen_random_uuid() |
|
||||||
|
| user_id | UUID | NOT NULL, FK users(id) CASCADE |
|
||||||
|
| url_hash | TEXT | NOT NULL (SHA-256 of normalized URL) |
|
||||||
|
| url | TEXT | NOT NULL |
|
||||||
|
| title | TEXT | NOT NULL, DEFAULT '' |
|
||||||
|
| source_type | TEXT | NOT NULL, DEFAULT 'unknown' |
|
||||||
|
| source_url | TEXT | nullable |
|
||||||
|
| category | TEXT | nullable |
|
||||||
|
| synthesis_id | UUID | nullable, FK syntheses(id) SET NULL |
|
||||||
|
| status | TEXT | NOT NULL, DEFAULT 'used' |
|
||||||
|
| scraped_ok | BOOLEAN | NOT NULL, DEFAULT true |
|
||||||
|
| job_id | UUID | NOT NULL |
|
||||||
|
| published_date | TEXT | nullable |
|
||||||
|
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
|
||||||
|
|
||||||
|
Indexes: `idx_article_history_user_url` on (user_id, url_hash), `idx_article_history_job_id`.
|
||||||
|
|
||||||
|
Status values: `used`, `filtered_history`, `filtered_diversity`, `filtered_not_article`, `filtered_too_old`, `filtered_empty`, `filtered_homepage`, `filtered_cross_phase_dedup`.
|
||||||
|
|
||||||
|
Source type values: `personalized_source`, `brave_search`, `web_search`.
|
||||||
|
|
||||||
|
### 3.10 `llm_call_log`
|
||||||
|
|
||||||
|
Full LLM interaction logging for debugging and analysis.
|
||||||
|
|
||||||
|
| Column | Type | Constraints |
|
||||||
|
|---|---|---|
|
||||||
|
| id | UUID | PK, DEFAULT gen_random_uuid() |
|
||||||
|
| user_id | UUID | NOT NULL, FK users(id) CASCADE |
|
||||||
|
| job_id | UUID | NOT NULL |
|
||||||
|
| call_type | TEXT | NOT NULL |
|
||||||
|
| model | TEXT | NOT NULL |
|
||||||
|
| system_prompt | TEXT | NOT NULL, DEFAULT '' |
|
||||||
|
| user_prompt | TEXT | NOT NULL, DEFAULT '' |
|
||||||
|
| response_body | TEXT | NOT NULL, DEFAULT '' |
|
||||||
|
| duration_ms | INTEGER | NOT NULL, DEFAULT 0 |
|
||||||
|
| article_url | TEXT | nullable |
|
||||||
|
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
|
||||||
|
|
||||||
|
Indexes: `idx_llm_call_log_job_id`, `idx_llm_call_log_user_id` on (user_id, created_at).
|
||||||
|
|
||||||
|
### 3.11 `admin_providers`
|
||||||
|
|
||||||
|
Admin-curated catalog of LLM providers and their models.
|
||||||
|
|
||||||
|
| Column | Type | Constraints |
|
||||||
|
|---|---|---|
|
||||||
|
| id | UUID | PK, DEFAULT gen_random_uuid() |
|
||||||
|
| provider_name | VARCHAR(50) | NOT NULL, UNIQUE |
|
||||||
|
| display_name | VARCHAR(100) | NOT NULL |
|
||||||
|
| models_scraping | JSONB | NOT NULL, DEFAULT '[]' |
|
||||||
|
| models_websearch | JSONB | NOT NULL, DEFAULT '[]' |
|
||||||
|
| is_enabled | BOOLEAN | NOT NULL, DEFAULT true |
|
||||||
|
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
|
||||||
|
| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
|
||||||
|
|
||||||
|
Indexes: `idx_admin_providers_enabled` (partial, WHERE is_enabled = true).
|
||||||
|
|
||||||
|
Seeded with: gemini, openai, anthropic.
|
||||||
|
|
||||||
|
JSONB model structure:
|
||||||
|
```json
|
||||||
|
[{"model_id": "gemini-2.5-pro", "display_name": "Gemini 2.5 Pro", "is_default": true}]
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3.12 `admin_rate_limits`
|
||||||
|
|
||||||
|
Per-provider rate limit configuration.
|
||||||
|
|
||||||
|
| Column | Type | Constraints |
|
||||||
|
|---|---|---|
|
||||||
|
| id | UUID | PK, DEFAULT gen_random_uuid() |
|
||||||
|
| provider_name | VARCHAR(50) | NOT NULL, UNIQUE, FK admin_providers(provider_name) CASCADE |
|
||||||
|
| max_requests | INTEGER | NOT NULL, DEFAULT 30 |
|
||||||
|
| time_window_seconds | INTEGER | NOT NULL, DEFAULT 60 |
|
||||||
|
| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
|
||||||
|
|
||||||
|
Seeded defaults: gemini 29/60s, openai 50/60s, anthropic 40/60s.
|
||||||
|
|
||||||
|
### 3.13 `user_api_keys`
|
||||||
|
|
||||||
|
Encrypted user LLM API keys.
|
||||||
|
|
||||||
|
| Column | Type | Constraints |
|
||||||
|
|---|---|---|
|
||||||
|
| id | UUID | PK, DEFAULT gen_random_uuid() |
|
||||||
|
| user_id | UUID | NOT NULL, FK users(id) CASCADE |
|
||||||
|
| provider_name | VARCHAR(50) | NOT NULL |
|
||||||
|
| encrypted_key | BYTEA | NOT NULL |
|
||||||
|
| nonce | BYTEA | NOT NULL |
|
||||||
|
| key_prefix | VARCHAR(20) | NOT NULL |
|
||||||
|
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
|
||||||
|
| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
|
||||||
|
|
||||||
|
Constraint: UNIQUE(user_id, provider_name). Valid providers: gemini, openai, anthropic, brave_search.
|
||||||
|
|
||||||
|
### 3.14 `audit_log`
|
||||||
|
|
||||||
|
Admin mutation audit trail.
|
||||||
|
|
||||||
|
| Column | Type | Constraints |
|
||||||
|
|---|---|---|
|
||||||
|
| id | UUID | PK, DEFAULT gen_random_uuid() |
|
||||||
|
| admin_user_id | UUID | nullable, FK users(id) SET NULL |
|
||||||
|
| action | VARCHAR(100) | NOT NULL |
|
||||||
|
| target_type | VARCHAR(50) | nullable |
|
||||||
|
| target_id | VARCHAR(255) | nullable |
|
||||||
|
| details | JSONB | nullable |
|
||||||
|
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
|
||||||
|
|
||||||
|
Indexes: `idx_audit_log_created_at` (DESC), `idx_audit_log_admin_user`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. API Endpoints
|
||||||
|
|
||||||
|
All endpoints are prefixed with `/api/v1`. Responses are JSON. Errors follow the shape `{ "error": "message" }`.
|
||||||
|
|
||||||
|
### 4.1 Authentication
|
||||||
|
|
||||||
|
**POST /auth/register**
|
||||||
|
- Auth: Public
|
||||||
|
- Body: `{ email: string, display_name?: string, turnstile_token: string }`
|
||||||
|
- Response: `{ message: string }`
|
||||||
|
- Sends magic link email. Rate limited.
|
||||||
|
|
||||||
|
**POST /auth/login**
|
||||||
|
- Auth: Public
|
||||||
|
- Body: `{ email: string, turnstile_token: string }`
|
||||||
|
- Response: `{ message: string }`
|
||||||
|
- Sends magic link email. Rate limited.
|
||||||
|
|
||||||
|
**GET /auth/verify?token=...&email=...**
|
||||||
|
- Auth: Public
|
||||||
|
- Response: Redirect to frontend with session cookie set.
|
||||||
|
|
||||||
|
**POST /auth/verify**
|
||||||
|
- Auth: Public
|
||||||
|
- Body: `{ token: string, email: string }`
|
||||||
|
- Response: `{ message: string, user: User }`
|
||||||
|
- Sets `session` HttpOnly cookie (30-day expiry).
|
||||||
|
|
||||||
|
**POST /auth/logout**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Response: `{ message: string }`
|
||||||
|
- Clears session cookie and deletes DB session.
|
||||||
|
|
||||||
|
**GET /auth/me**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Response: `{ id, email, display_name, role, created_at }`
|
||||||
|
|
||||||
|
### 4.2 Settings
|
||||||
|
|
||||||
|
**GET /settings**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Response: `UserSettings` (creates defaults if not exists)
|
||||||
|
|
||||||
|
**PUT /settings**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Body: `UpdateSettingsRequest` (all fields required)
|
||||||
|
- Validation: max_articles_per_source 1-10, max_links_per_source 1-30, batch_size 1-20, source_extraction_window 1-10, article_history_days 0-365, search_agent_behavior max 2000 chars, ai_provider/ai_model/ai_model_websearch max 100 chars.
|
||||||
|
- Response: Updated `UserSettings`
|
||||||
|
|
||||||
|
### 4.3 Themes
|
||||||
|
|
||||||
|
**GET /themes**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Response: `ThemeResponse[]`
|
||||||
|
|
||||||
|
**POST /themes**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Body: `{ name, theme, categories: string[], max_items_per_category?, max_age_days?, summary_length? }`
|
||||||
|
- Validation: name non-empty max 200 chars, categories 1-20 non-empty entries, max_items 1-50, max_age 1-365, summary_length 1-3.
|
||||||
|
- Response: `ThemeResponse`
|
||||||
|
|
||||||
|
**PUT /themes/{id}**
|
||||||
|
- Auth: Authenticated (owner only)
|
||||||
|
- Body: `UpdateThemeRequest` (all fields optional)
|
||||||
|
- Response: `ThemeResponse`
|
||||||
|
|
||||||
|
**DELETE /themes/{id}**
|
||||||
|
- Auth: Authenticated (owner only)
|
||||||
|
- Response: 204 No Content
|
||||||
|
|
||||||
|
### 4.4 Schedules
|
||||||
|
|
||||||
|
**GET /themes/{id}/schedule**
|
||||||
|
- Auth: Authenticated (theme owner)
|
||||||
|
- Response: `ScheduleResponse` or 404
|
||||||
|
|
||||||
|
**PUT /themes/{id}/schedule**
|
||||||
|
- Auth: Authenticated (theme owner)
|
||||||
|
- Body: `{ enabled, days: string[], time_utc: "HH:MM", emails: string[] }`
|
||||||
|
- Validation: days from mon-sun, time HH:MM format, max 3 emails.
|
||||||
|
- Response: `ScheduleResponse`
|
||||||
|
|
||||||
|
**DELETE /themes/{id}/schedule**
|
||||||
|
- Auth: Authenticated (theme owner)
|
||||||
|
- Response: 204 No Content
|
||||||
|
|
||||||
|
### 4.5 Sources
|
||||||
|
|
||||||
|
**GET /sources?theme_id=...**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Response: `SourceResponse[]`
|
||||||
|
|
||||||
|
**POST /sources**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Body: `{ title, url, theme_id? }`
|
||||||
|
- Validation: title non-empty max 200, URL http(s) max 1000 chars.
|
||||||
|
- Response: `SourceResponse`
|
||||||
|
|
||||||
|
**PUT /sources/preferred**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Body: `{ source_ids: UUID[] }`
|
||||||
|
- Response: `{ updated: number }`
|
||||||
|
|
||||||
|
**DELETE /sources/{id}**
|
||||||
|
- Auth: Authenticated (owner only)
|
||||||
|
- Response: 204 No Content
|
||||||
|
|
||||||
|
**POST /sources/bulk**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Body: `{ sources: CreateSourceRequest[], theme_id? }`
|
||||||
|
- Response: `{ imported, skipped, errors }`
|
||||||
|
|
||||||
|
**POST /sources/import-csv**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Body: Multipart file upload (CSV: title,url)
|
||||||
|
- Response: `{ imported, skipped, errors }`
|
||||||
|
|
||||||
|
**GET /sources/export-csv**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Response: CSV file download
|
||||||
|
|
||||||
|
### 4.6 Generation
|
||||||
|
|
||||||
|
**POST /syntheses/generate**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Body: `{ theme_id: UUID }`
|
||||||
|
- Response: `{ job_id: UUID }`
|
||||||
|
- Creates job in JobStore, spawns background generation task. Returns 409 if user already has active job.
|
||||||
|
|
||||||
|
**GET /syntheses/generate/{job_id}/progress**
|
||||||
|
- Auth: Authenticated (job owner)
|
||||||
|
- Response: SSE stream of `ProgressEvent`
|
||||||
|
- Events: `progress` (step, message, percent), `complete` (synthesis_id), `error` (message).
|
||||||
|
|
||||||
|
**POST /syntheses/generate/{job_id}/stop**
|
||||||
|
- Auth: Authenticated (job owner)
|
||||||
|
- Response: `{ message: string }`
|
||||||
|
- Sets cooperative cancellation flag.
|
||||||
|
|
||||||
|
### 4.7 Syntheses
|
||||||
|
|
||||||
|
**GET /syntheses**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Response: `SynthesisListItem[]` (with section summaries, theme info)
|
||||||
|
|
||||||
|
**GET /syntheses/{id}**
|
||||||
|
- Auth: Authenticated (owner only)
|
||||||
|
- Response: `SynthesisResponse` (full sections data)
|
||||||
|
|
||||||
|
**DELETE /syntheses/{id}**
|
||||||
|
- Auth: Authenticated (owner only)
|
||||||
|
- Response: 204 No Content
|
||||||
|
|
||||||
|
**POST /syntheses/{id}/send-email**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Body: `{ email: string }`
|
||||||
|
- Response: `{ message: string }`
|
||||||
|
|
||||||
|
**GET /syntheses/{id}/export/markdown**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Response: Markdown file download
|
||||||
|
|
||||||
|
**GET /syntheses/{id}/export/pdf**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Response: PDF file download
|
||||||
|
|
||||||
|
### 4.8 Article History & Provenance
|
||||||
|
|
||||||
|
**GET /article-history?limit=&offset=&job_id=&status=**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Response: `{ items: ArticleHistoryEntry[], total: number }`
|
||||||
|
|
||||||
|
**DELETE /article-history**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Response: `{ deleted: number }`
|
||||||
|
|
||||||
|
**GET /syntheses/{id}/provenance**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Response: `ArticleHistoryEntry[]` (articles with status "used" for this synthesis's job_id)
|
||||||
|
|
||||||
|
### 4.9 LLM Call Logs
|
||||||
|
|
||||||
|
**GET /llm-logs/{job_id}**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Response: `LlmCallLogEntry[]`
|
||||||
|
|
||||||
|
### 4.10 User API Keys
|
||||||
|
|
||||||
|
**GET /user/api-keys**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Response: `ApiKeyResponse[]` (id, provider_name, key_prefix, timestamps; never the full key)
|
||||||
|
|
||||||
|
**POST /user/api-keys**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Body: `{ provider_name, api_key }`
|
||||||
|
- Validation: provider in (gemini, openai, anthropic, brave_search), key 8-500 chars.
|
||||||
|
- Response: `ApiKeyResponse`
|
||||||
|
- Encrypts key with AES-256-GCM before storage; upserts (one key per user per provider).
|
||||||
|
|
||||||
|
**DELETE /user/api-keys/{provider}**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Response: 204 No Content
|
||||||
|
|
||||||
|
**POST /user/api-keys/{provider}/test**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Response: `{ success: boolean, message: string }`
|
||||||
|
- Decrypts key, calls provider test endpoint.
|
||||||
|
|
||||||
|
**POST /user/api-keys/export**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Response: `{ keys: [{ provider_name, api_key }] }`
|
||||||
|
- Decrypts and returns all keys (used for backup/migration).
|
||||||
|
|
||||||
|
### 4.11 Public Configuration
|
||||||
|
|
||||||
|
**GET /config/providers**
|
||||||
|
- Auth: Authenticated
|
||||||
|
- Response: `ProviderConfigResponse[]` (enabled providers with model lists for scraping and websearch)
|
||||||
|
|
||||||
|
### 4.12 Admin Endpoints
|
||||||
|
|
||||||
|
All admin endpoints require `AdminUser` extractor (role = admin).
|
||||||
|
|
||||||
|
**GET /admin/providers**
|
||||||
|
- Response: `AdminProviderResponse[]`
|
||||||
|
|
||||||
|
**POST /admin/providers**
|
||||||
|
- Body: `CreateProviderRequest`
|
||||||
|
- Validation: provider_name in (gemini, openai, anthropic), at least one model per list, at most one default per list.
|
||||||
|
- Response: `AdminProviderResponse`
|
||||||
|
|
||||||
|
**PUT /admin/providers/{id}**
|
||||||
|
- Body: `UpdateProviderRequest` (all fields optional)
|
||||||
|
- Response: `AdminProviderResponse`
|
||||||
|
|
||||||
|
**DELETE /admin/providers/{id}**
|
||||||
|
- Response: 204 No Content
|
||||||
|
|
||||||
|
**GET /admin/rate-limits**
|
||||||
|
- Response: `RateLimitResponse[]`
|
||||||
|
|
||||||
|
**PUT /admin/rate-limits/{provider_name}**
|
||||||
|
- Body: `{ max_requests: 1-1000, time_window_seconds: 1-3600 }`
|
||||||
|
- Response: `RateLimitResponse`
|
||||||
|
- Hot-reloads the in-memory provider rate limiter.
|
||||||
|
|
||||||
|
**GET /admin/users**
|
||||||
|
- Response: `AdminUserResponse[]`
|
||||||
|
|
||||||
|
**PUT /admin/users/{id}/role**
|
||||||
|
- Body: `{ role: "user" | "admin" }`
|
||||||
|
- Response: `{ message: string }`
|
||||||
|
|
||||||
|
**GET /health**
|
||||||
|
- Auth: Public
|
||||||
|
- Response: `{ status: "ok" }`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Generation Pipeline Technical Flow
|
||||||
|
|
||||||
|
### Overview
|
||||||
|
|
||||||
|
The pipeline runs as a background tokio task spawned by `POST /syntheses/generate`. It has a 15-minute global timeout and supports cooperative cancellation via `AtomicBool`.
|
||||||
|
|
||||||
|
### Initialization
|
||||||
|
|
||||||
|
1. Load `UserSettings` from DB (or create defaults)
|
||||||
|
2. Cleanup old article history (entries older than `article_history_days` with dropped status) and truncate old LLM call logs
|
||||||
|
3. Load the target `Theme` (categories, max_items, max_age_days, summary_length)
|
||||||
|
4. Load user `Sources` for the theme
|
||||||
|
5. Decrypt user's LLM API key, create `Arc<dyn LlmProvider>` via factory
|
||||||
|
6. Resolve models: `ai_model` (for scraping/classification) and `ai_model_websearch` (for web search); user override or admin default fallback
|
||||||
|
7. Initialize per-user rate limiter (from settings or admin defaults)
|
||||||
|
8. Initialize tracking structures: `article_scraped` (category -> Vec<NewsItem>), `source_counts`, `url_source`, `filled_counts`, `seen_urls`, `pending_traces`
|
||||||
|
|
||||||
|
### Phase 1: Personalized Sources
|
||||||
|
|
||||||
|
Skipped if user has 0 sources for the theme.
|
||||||
|
|
||||||
|
**1a. Windowed source extraction**
|
||||||
|
|
||||||
|
- Query article_history for the last source used; reorder sources in a rolling window starting after that source
|
||||||
|
- Select up to `source_extraction_window` sources per generation
|
||||||
|
- For each source (bounded concurrency of 5): fetch page HTML, extract up to `max_links_per_source` article URLs via HTML parsing (same-domain, non-homepage, no static assets)
|
||||||
|
- Deduplicate URLs cross-source via `seen_urls`
|
||||||
|
- Batch-check `article_history` for already-seen URL hashes; filter matches (traced as `filtered_history`)
|
||||||
|
- Shuffle remaining candidates to interleave sources
|
||||||
|
- Track url -> source in `url_source`
|
||||||
|
|
||||||
|
**1b. Batch scrape + classify**
|
||||||
|
|
||||||
|
Processing in batches of `settings.batch_size`:
|
||||||
|
|
||||||
|
- **Batch assembly**: Pull up to batch_size candidates, skip if `source_counts[domain] >= max_articles_per_source` (traced as `filtered_diversity`)
|
||||||
|
- **Scrape** (JoinSet, parallel): SSRF check, 15s timeout, 5MB limit, HTML parsing, title/date/body extraction, soft-404 detection. Skip empty/too-old articles.
|
||||||
|
- **Classify** (JoinSet, parallel): Rate limit check (60s wait), send title + first 500 chars to LLM with categories list. LLM returns `{title, summary, category}`. Validate category via `assign_category()` (fallback to "Autre", drop if full).
|
||||||
|
- **LLM call logging**: Every LLM call is logged with full prompt, response, timing, and article URL.
|
||||||
|
- **Early exit**: Stop when total articles >= `(num_categories + 1) * max_items_per_category`.
|
||||||
|
- Batch-flush pending traces to `article_history`.
|
||||||
|
|
||||||
|
### Phase 2: Web Search Fallback
|
||||||
|
|
||||||
|
Skipped if all categories are filled to `max_items_per_category`.
|
||||||
|
|
||||||
|
**2a. Compute gaps**: For each category, `needed = max_items - filled`.
|
||||||
|
|
||||||
|
**2b. Path selection** based on `settings.use_brave_search`:
|
||||||
|
|
||||||
|
**Path A -- Brave Search** (`use_brave_search = true`):
|
||||||
|
- Decrypt user's Brave Search API key
|
||||||
|
- Query: `"{theme} actualites"`, up to 20 results, freshness mapped from `max_age_days` (pd/pw/pm/py)
|
||||||
|
- Filter results through `filter_phase2_url()`: homepage filter, cross-phase dedup, article history check, source diversity check
|
||||||
|
- Batch scrape + classify (same logic as Phase 1b, source_type = "brave_search")
|
||||||
|
|
||||||
|
**Path B -- LLM Web Search** (`use_brave_search = false`):
|
||||||
|
- Build search prompt with theme, categories, and gap counts
|
||||||
|
- Call LLM with `ai_model_websearch` model; returns structured JSON: `{category_0: [{title, url, summary}], ...}`
|
||||||
|
- Filter URLs through `filter_phase2_url()`
|
||||||
|
- Scrape each result sequentially to validate; keep LLM-provided title/summary (no re-classification)
|
||||||
|
- source_type = "web_search"
|
||||||
|
|
||||||
|
### Save & Record
|
||||||
|
|
||||||
|
1. Error if all article lists are empty
|
||||||
|
2. Order sections: user-defined categories first (in order), then "Autre" if non-empty
|
||||||
|
3. Sanitize: strip `\u0000` null bytes from JSON (PostgreSQL JSONB requirement)
|
||||||
|
4. Insert synthesis row: job_id, week (ISO week string), sections (JSONB), status "completed", theme_id
|
||||||
|
5. Record used articles: batch-insert `article_history` entries with status "used", synthesis_id, and correct source_type
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. LLM Provider Abstraction
|
||||||
|
|
||||||
|
### Trait Definition
|
||||||
|
|
||||||
|
```rust
|
||||||
|
#[async_trait]
|
||||||
|
pub trait LlmProvider: Send + Sync {
|
||||||
|
fn provider_id(&self) -> &str;
|
||||||
|
async fn call_llm(&self, model: &str, system_prompt: &str,
|
||||||
|
user_prompt: &str, response_schema: &Value)
|
||||||
|
-> Result<Value, AppError>;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
All calls use structured JSON output (response_schema defines the expected shape).
|
||||||
|
|
||||||
|
### Implementations
|
||||||
|
|
||||||
|
| Provider | Module | API Endpoint | Auth Method |
|
||||||
|
|---|---|---|---|
|
||||||
|
| Google Gemini | `llm/gemini.rs` | `generativelanguage.googleapis.com` | Query param `?key=` |
|
||||||
|
| OpenAI | `llm/openai.rs` | `api.openai.com/v1/chat/completions` | Bearer token |
|
||||||
|
| Anthropic | `llm/anthropic.rs` | `api.anthropic.com/v1/messages` | `x-api-key` header |
|
||||||
|
| Mock | `llm/mock.rs` | N/A (in-memory) | N/A |
|
||||||
|
|
||||||
|
### Factory
|
||||||
|
|
||||||
|
`llm/factory.rs` provides `create_provider(provider_name, api_key, http_client) -> Arc<dyn LlmProvider>`. Matches on provider name string.
|
||||||
|
|
||||||
|
### Response Schema
|
||||||
|
|
||||||
|
`llm/schema.rs` builds JSON Schema definitions for:
|
||||||
|
- Classification/summarization: `{title, summary, category, is_article}`
|
||||||
|
- Web search: `{category_0: [{title, url, summary}], ...}` with per-category arrays
|
||||||
|
- Source link extraction: `{links: [{url}]}`
|
||||||
|
|
||||||
|
### Error Mapping
|
||||||
|
|
||||||
|
`map_provider_http_error()` translates HTTP status codes to `AppError` variants:
|
||||||
|
- 400 -> BadRequest
|
||||||
|
- 401/403 -> BadRequest (invalid key)
|
||||||
|
- 404 -> BadRequest (model not found)
|
||||||
|
- 429/529 -> RateLimited
|
||||||
|
- Other -> Internal
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Background Tasks
|
||||||
|
|
||||||
|
### Session Cleanup
|
||||||
|
|
||||||
|
Runs hourly via `tokio::spawn`. Calls `db::sessions::delete_expired` to remove sessions past their `expires_at` timestamp.
|
||||||
|
|
||||||
|
### Job Store Cleanup
|
||||||
|
|
||||||
|
`JobStore::cleanup_expired` removes job entries older than 1 hour (the TTL constant). Called periodically. Releases user locks for expired jobs.
|
||||||
|
|
||||||
|
### Scheduler
|
||||||
|
|
||||||
|
Runs every minute via `tokio::spawn` with a 60-second interval. For each tick:
|
||||||
|
|
||||||
|
1. `current_day_code()` -> "mon" through "sun"
|
||||||
|
2. `find_due_schedules(pool, day, time)` -> queries enabled schedules matching current day and time (HH:MM)
|
||||||
|
3. For each due schedule:
|
||||||
|
- Skip if `job_store.has_active_job(user_id)` returns Some (manual generation in progress)
|
||||||
|
- Create a temporary `watch::channel` and `AtomicBool`
|
||||||
|
- Call `synthesis::run_generation_inner` directly (bypasses job store)
|
||||||
|
- On success: send emails to configured recipients (up to 3), mark schedule as run
|
||||||
|
- On failure: log error, do not mark as run
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Configuration
|
||||||
|
|
||||||
|
### Environment Variables
|
||||||
|
|
||||||
|
| Variable | Required | Default | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| DATABASE_URL | Yes | - | PostgreSQL connection string |
|
||||||
|
| MASTER_ENCRYPTION_KEY | Yes | - | 64 hex chars (32 bytes) for AES-256-GCM |
|
||||||
|
| APP_URL | Yes | - | Public URL (CORS, magic links, cookies). No trailing slash. |
|
||||||
|
| PORT | No | 8080 | HTTP server port |
|
||||||
|
| RUST_LOG | No | - | Logging filter (e.g., "info,ai_synth_backend=debug") |
|
||||||
|
| STATIC_DIR | No | ../frontend/dist | Path to built SolidJS files |
|
||||||
|
| RESEND_API_KEY | Yes | - | Resend email service API key |
|
||||||
|
| EMAIL_FROM | Yes | - | Sender address for emails |
|
||||||
|
| TURNSTILE_SECRET_KEY | Yes | - | Cloudflare Turnstile server secret |
|
||||||
|
| TURNSTILE_SITE_KEY | Yes | - | Cloudflare Turnstile client key |
|
||||||
|
| POSTGRES_PASSWORD | Yes | - | Used by docker-compose for DB container |
|
||||||
|
|
||||||
|
### Startup Validation
|
||||||
|
|
||||||
|
`AppConfig::validate()` checks at startup:
|
||||||
|
- `MASTER_ENCRYPTION_KEY` is exactly 64 hex characters
|
||||||
|
- `APP_URL` starts with http:// or https:// and has no trailing slash
|
||||||
|
|
||||||
|
The application refuses to start with invalid configuration.
|
||||||
|
|
||||||
|
### User Settings Model
|
||||||
|
|
||||||
|
Default values applied when a user has no saved settings:
|
||||||
|
|
||||||
|
| Setting | Default | Range |
|
||||||
|
|---|---|---|
|
||||||
|
| max_articles_per_source | 3 | 1-10 |
|
||||||
|
| max_links_per_source | 8 | 1-30 |
|
||||||
|
| use_brave_search | false | boolean |
|
||||||
|
| article_history_days | 90 | 0-365 |
|
||||||
|
| batch_size | 5 | 1-20 |
|
||||||
|
| source_extraction_window | 3 | 1-10 |
|
||||||
|
| search_agent_behavior | "" | max 2000 chars |
|
||||||
|
| ai_provider | "" | max 100 chars |
|
||||||
|
| ai_model | "" | max 100 chars |
|
||||||
|
| ai_model_websearch | "" | max 100 chars |
|
||||||
|
| rate_limit_max_requests | null | >= 1 if set |
|
||||||
|
| rate_limit_time_window_seconds | null | >= 1 if set |
|
||||||
Loading…
Reference in New Issue