fix: resolve all markdownlint errors (blank lines, table spacing, bare URLs)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
master
oabrivard 2 months ago
parent 0963559e0f
commit ad613aa001

@ -0,0 +1,7 @@
{
"MD013": false,
"MD024": { "siblings_only": true },
"MD033": false,
"MD036": false,
"MD040": false
}

@ -1,15 +1,18 @@
# AI Weekly Synth # AI Weekly Synth
## Overview ## Overview
AI Weekly Synth is a self-hosted web application that generates AI-powered weekly news syntheses. Users create themes (topics), configure categories and sources, then the app scrapes sources, classifies articles via LLM, and produces structured summaries. Supports scheduled generation with email delivery. AI Weekly Synth is a self-hosted web application that generates AI-powered weekly news syntheses. Users create themes (topics), configure categories and sources, then the app scrapes sources, classifies articles via LLM, and produces structured summaries. Supports scheduled generation with email delivery.
## Architecture ## Architecture
- **Backend**: Rust (Axum) — `backend/` - **Backend**: Rust (Axum) — `backend/`
- **Frontend**: SolidJS + Tailwind CSS v4 — `frontend/` - **Frontend**: SolidJS + Tailwind CSS v4 — `frontend/`
- **Database**: PostgreSQL (via sqlx with runtime-checked queries) - **Database**: PostgreSQL (via sqlx with runtime-checked queries)
- **Deployment**: Docker (`docker-compose.yml`, `restart: unless-stopped`) - **Deployment**: Docker (`docker-compose.yml`, `restart: unless-stopped`)
## Project Structure ## Project Structure
``` ```
ai_synth/ ai_synth/
├── backend/ Rust/Axum backend ├── backend/ Rust/Axum backend
@ -51,6 +54,7 @@ ai_synth/
``` ```
## Documentation ## Documentation
- [`docs/requirements.md`](docs/requirements.md) — Product vision, features, user roles, non-functional requirements - [`docs/requirements.md`](docs/requirements.md) — Product vision, features, user roles, non-functional requirements
- [`docs/functional_specs.md`](docs/functional_specs.md) — User journeys, feature details, pipeline description - [`docs/functional_specs.md`](docs/functional_specs.md) — User journeys, feature details, pipeline description
- [`docs/architecture.md`](docs/architecture.md) — System design, layers, data model, security, concurrency - [`docs/architecture.md`](docs/architecture.md) — System design, layers, data model, security, concurrency
@ -60,6 +64,7 @@ ai_synth/
- [`docs/deployment.md`](docs/deployment.md) — Docker setup, env vars, monitoring, security - [`docs/deployment.md`](docs/deployment.md) — Docker setup, env vars, monitoring, security
## Key Features ## Key Features
- **Multi-Theme**: Users create multiple themes, each with its own categories, sources, and schedule - **Multi-Theme**: Users create multiple themes, each with its own categories, sources, and schedule
- **LLM Providers**: Google Gemini, OpenAI, Anthropic — users bring their own API keys - **LLM Providers**: Google Gemini, OpenAI, Anthropic — users bring their own API keys
- **Generation Pipeline**: Two-phase (personalized sources → web search fallback), windowed extraction, batched scrape+classify - **Generation Pipeline**: Two-phase (personalized sources → web search fallback), windowed extraction, batched scrape+classify
@ -76,12 +81,14 @@ ai_synth/
## Running Locally ## Running Locally
### Docker (production) ### Docker (production)
```bash ```bash
cp .env.example .env # Fill in values cp .env.example .env # Fill in values
docker compose up -d docker compose up -d
``` ```
### Development ### Development
```bash ```bash
# Backend (requires Postgres running) # Backend (requires Postgres running)
cd backend && cargo run -- serve cd backend && cargo run -- serve
@ -91,11 +98,13 @@ cd frontend && npm install && npm run dev
``` ```
### CLI ### CLI
```bash ```bash
cd backend && cargo run -- create-admin admin@example.com cd backend && cargo run -- create-admin admin@example.com
``` ```
## Testing ## Testing
```bash ```bash
# Backend unit tests (no Postgres needed) # Backend unit tests (no Postgres needed)
cd backend && cargo test --lib cd backend && cargo test --lib
@ -114,10 +123,13 @@ cd frontend && npx tsc --noEmit
``` ```
## Database (30 migrations) ## Database (30 migrations)
Tables: `users`, `sessions`, `magic_link_tokens`, `settings`, `themes`, `theme_schedules`, `sources`, `syntheses`, `article_history`, `llm_call_log`, `admin_providers`, `admin_rate_limits`, `user_api_keys`, `audit_log` Tables: `users`, `sessions`, `magic_link_tokens`, `settings`, `themes`, `theme_schedules`, `sources`, `syntheses`, `article_history`, `llm_call_log`, `admin_providers`, `admin_rate_limits`, `user_api_keys`, `audit_log`
## Environment Variables ## Environment Variables
See `.env.example` for the complete list. Key ones: See `.env.example` for the complete list. Key ones:
- `DATABASE_URL` — Postgres connection string - `DATABASE_URL` — Postgres connection string
- `MASTER_ENCRYPTION_KEY` — 64 hex chars for AES-256-GCM - `MASTER_ENCRYPTION_KEY` — 64 hex chars for AES-256-GCM
- `RESEND_API_KEY` — for email sending - `RESEND_API_KEY` — for email sending
@ -125,6 +137,7 @@ See `.env.example` for the complete list. Key ones:
- `APP_URL` — public URL (for CORS, magic links, cookies) - `APP_URL` — public URL (for CORS, magic links, cookies)
## Design Decisions ## Design Decisions
- Idiomatic Rust (learning project) — no unwrap() in production code - Idiomatic Rust (learning project) — no unwrap() in production code
- Users bring their own LLM API keys (encrypted at rest) - Users bring their own LLM API keys (encrypted at rest)
- Admin curates available providers/models, users select from the list - Admin curates available providers/models, users select from the list

@ -122,6 +122,7 @@ All mutating API endpoints require the `X-Requested-With` header (checked by `cs
### Encryption at Rest ### Encryption at Rest
User LLM API keys are encrypted with AES-256-GCM before storage: User LLM API keys are encrypted with AES-256-GCM before storage:
- 32-byte master key from `MASTER_ENCRYPTION_KEY` env var (64 hex chars) - 32-byte master key from `MASTER_ENCRYPTION_KEY` env var (64 hex chars)
- Random 12-byte nonce per encryption (stored alongside ciphertext) - Random 12-byte nonce per encryption (stored alongside ciphertext)
- Key bytes are zeroized on drop (`zeroize` crate) - Key bytes are zeroized on drop (`zeroize` crate)
@ -130,6 +131,7 @@ User LLM API keys are encrypted with AES-256-GCM before storage:
### SSRF Prevention ### SSRF Prevention
Both `scraper.rs` and `source_scraper.rs` validate URLs before fetching: Both `scraper.rs` and `source_scraper.rs` validate URLs before fetching:
- DNS resolution check against private/loopback IP ranges - DNS resolution check against private/loopback IP ranges
- Redirect chain validation (no redirects to private IPs) - Redirect chain validation (no redirects to private IPs)
- Only HTTP/HTTPS schemes allowed - Only HTTP/HTTPS schemes allowed
@ -137,6 +139,7 @@ Both `scraper.rs` and `source_scraper.rs` validate URLs before fetching:
### Security Headers ### Security Headers
Applied as global middleware layers: Applied as global middleware layers:
- `Content-Security-Policy` (self + Cloudflare Turnstile) - `Content-Security-Policy` (self + Cloudflare Turnstile)
- `X-Content-Type-Options: nosniff` - `X-Content-Type-Options: nosniff`
- `X-Frame-Options: DENY` - `X-Frame-Options: DENY`

@ -29,6 +29,7 @@ The application will be available at `http://localhost:8080` (or the port config
The `docker-compose.yml` defines two services: The `docker-compose.yml` defines two services:
**app** (AI Weekly Synth backend + frontend): **app** (AI Weekly Synth backend + frontend):
- Multi-stage Docker image: Node.js builds the frontend, Rust builds the backend, then both are combined into a minimal Debian runtime - Multi-stage Docker image: Node.js builds the frontend, Rust builds the backend, then both are combined into a minimal Debian runtime
- Runs as a non-root user (`appuser`) - Runs as a non-root user (`appuser`)
- Depends on `db` with a health check condition (waits for Postgres to be ready) - Depends on `db` with a health check condition (waits for Postgres to be ready)
@ -36,6 +37,7 @@ The `docker-compose.yml` defines two services:
- Restart policy: `unless-stopped` - Restart policy: `unless-stopped`
**db** (PostgreSQL 17 Alpine): **db** (PostgreSQL 17 Alpine):
- Data persisted to a named Docker volume (`postgres_data`) - Data persisted to a named Docker volume (`postgres_data`)
- Exposed on `127.0.0.1:5432` (localhost only, not accessible from external networks) - Exposed on `127.0.0.1:5432` (localhost only, not accessible from external networks)
- Health check: `pg_isready` every 10 seconds - Health check: `pg_isready` every 10 seconds
@ -59,20 +61,20 @@ All environment variables are documented in `.env.example`. The `.env` file is l
### Required ### Required
| Variable | Description | Example | | Variable | Description | Example |
|----------|-------------|---------| | ---------- | ------------- | --------- |
| `DATABASE_URL` | PostgreSQL connection string. In docker-compose, the hostname is `db`. | `postgres://ai_synth:secret@db:5432/ai_synth` | | `DATABASE_URL` | PostgreSQL connection string. In docker-compose, the hostname is `db`. | `postgres://ai_synth:secret@db:5432/ai_synth` |
| `POSTGRES_PASSWORD` | Password for the PostgreSQL user. Used by both the `db` service and in `DATABASE_URL`. | `a-strong-random-password` | | `POSTGRES_PASSWORD` | Password for the PostgreSQL user. Used by both the `db` service and in `DATABASE_URL`. | `a-strong-random-password` |
| `MASTER_ENCRYPTION_KEY` | 256-bit key for AES-256-GCM encryption of user API keys at rest. Must be exactly 64 hex characters. Generate with `openssl rand -hex 32`. **Back this up securely -- losing it means all stored API keys become unreadable.** | `ab12cd34...` (64 hex chars) | | `MASTER_ENCRYPTION_KEY` | 256-bit key for AES-256-GCM encryption of user API keys at rest. Must be exactly 64 hex characters. Generate with `openssl rand -hex 32`. **Back this up securely -- losing it means all stored API keys become unreadable.** | `ab12cd34...` (64 hex chars) |
| `APP_URL` | Public URL where the app is accessible (no trailing slash). Used for magic link URLs, CORS origin, and cookie domain. | `https://synth.example.com` | | `APP_URL` | Public URL where the app is accessible (no trailing slash). Used for magic link URLs, CORS origin, and cookie domain. | `https://synth.example.com` |
| `RESEND_API_KEY` | API key for Resend (email service). Required for magic link emails and synthesis email export. Sign up at https://resend.com. | `re_xxxxx` | | `RESEND_API_KEY` | API key for Resend (email service). Required for magic link emails and synthesis email export. Sign up at <https://resend.com>. | `re_xxxxx` |
| `EMAIL_FROM` | Sender address for emails. Must be a verified domain in Resend. | `AI Weekly Synth <noreply@synth.example.com>` | | `EMAIL_FROM` | Sender address for emails. Must be a verified domain in Resend. | `AI Weekly Synth <noreply@synth.example.com>` |
| `TURNSTILE_SECRET_KEY` | Server-side secret key for Cloudflare Turnstile captcha. Sign up at https://dash.cloudflare.com/turnstile. | `0x4AAAAAAA...` | | `TURNSTILE_SECRET_KEY` | Server-side secret key for Cloudflare Turnstile captcha. Sign up at <https://dash.cloudflare.com/turnstile>. | `0x4AAAAAAA...` |
| `TURNSTILE_SITE_KEY` | Client-side site key for Cloudflare Turnstile. | `0x4BBBBBB...` | | `TURNSTILE_SITE_KEY` | Client-side site key for Cloudflare Turnstile. | `0x4BBBBBB...` |
### Optional ### Optional
| Variable | Description | Default | | Variable | Description | Default |
|----------|-------------|---------| | ---------- | ------------- | --------- |
| `PORT` | Port for the backend HTTP server (inside the container). The docker-compose maps this to the host. | `8080` | | `PORT` | Port for the backend HTTP server (inside the container). The docker-compose maps this to the host. | `8080` |
| `RUST_LOG` | Logging level. Format: `level` or `level,crate=level`. | `info,ai_synth_backend=debug` | | `RUST_LOG` | Logging level. Format: `level` or `level,crate=level`. | `info,ai_synth_backend=debug` |
| `STATIC_DIR` | Path to the built frontend files. In Docker, this is `./static` (set by docker-compose). For local dev, use `../frontend/dist`. | `./static` (Docker) | | `STATIC_DIR` | Path to the built frontend files. In Docker, this is `./static` (set by docker-compose). For local dev, use `../frontend/dist`. | `./static` (Docker) |
@ -87,6 +89,7 @@ All environment variables are documented in `.env.example`. The `.env` file is l
The application uses PostgreSQL 17. The `docker-compose.yml` runs it as the `db` service with a named volume for data persistence. The application uses PostgreSQL 17. The `docker-compose.yml` runs it as the `db` service with a named volume for data persistence.
Key configuration: Key configuration:
- User: `ai_synth` (configurable via `POSTGRES_PASSWORD`) - User: `ai_synth` (configurable via `POSTGRES_PASSWORD`)
- Database: `ai_synth` - Database: `ai_synth`
- Shared memory: 128 MB (for complex queries) - Shared memory: 128 MB (for complex queries)
@ -103,7 +106,7 @@ No manual migration step is needed. The application will not start serving reque
The database contains the following tables: The database contains the following tables:
| Table | Purpose | | Table | Purpose |
|-------|---------| | ------- | --------- |
| `users` | User accounts (email, display name, role) | | `users` | User accounts (email, display name, role) |
| `sessions` | Active sessions (hashed tokens, expiry) | | `sessions` | Active sessions (hashed tokens, expiry) |
| `magic_link_tokens` | Passwordless login tokens | | `magic_link_tokens` | Passwordless login tokens |
@ -165,6 +168,7 @@ RUST_LOG=info,ai_synth_backend=debug
``` ```
This provides: This provides:
- `info` level for all crates (HTTP requests, startup/shutdown, background tasks) - `info` level for all crates (HTTP requests, startup/shutdown, background tasks)
- `debug` level for the application code (detailed pipeline progress, LLM call timing) - `debug` level for the application code (detailed pipeline progress, LLM call timing)
@ -217,6 +221,7 @@ docker compose up -d --build
``` ```
This will: This will:
1. Rebuild the Docker image (frontend build + Rust compilation) 1. Rebuild the Docker image (frontend build + Rust compilation)
2. Restart the `app` container with the new image 2. Restart the `app` container with the new image
3. Automatically run any new migrations on startup 3. Automatically run any new migrations on startup

@ -127,6 +127,7 @@ pub enum AppError {
``` ```
Key rules: Key rules:
- **Never use `unwrap()` in production code.** Use `?`, `ok_or_else`, `map_err`, or `unwrap_or_default` with appropriate logging. `unwrap()` is only acceptable in `#[cfg(test)]` blocks and `LazyLock` static initializers. - **Never use `unwrap()` in production code.** Use `?`, `ok_or_else`, `map_err`, or `unwrap_or_default` with appropriate logging. `unwrap()` is only acceptable in `#[cfg(test)]` blocks and `LazyLock` static initializers.
- **`AppError::Internal` hides details** from the client. The full error is logged via `tracing::error!` but the response body only contains `"An internal error occurred"`. - **`AppError::Internal` hides details** from the client. The full error is logged via `tracing::error!` but the response body only contains `"An internal error occurred"`.
- **`From<sqlx::Error>` and `From<anyhow::Error>`** conversions are implemented, so you can use `?` with both types. - **`From<sqlx::Error>` and `From<anyhow::Error>`** conversions are implemented, so you can use `?` with both types.
@ -135,6 +136,7 @@ Key rules:
#### Arc Usage #### Arc Usage
`Arc` is used to share data across `tokio::spawn` boundaries. Common patterns: `Arc` is used to share data across `tokio::spawn` boundaries. Common patterns:
- `Arc<dyn LlmProvider>` for the LLM provider (shared across classify tasks) - `Arc<dyn LlmProvider>` for the LLM provider (shared across classify tasks)
- `Arc<AtomicBool>` for cancellation flags - `Arc<AtomicBool>` for cancellation flags
- `Arc<watch::Sender<ProgressEvent>>` for SSE progress channels - `Arc<watch::Sender<ProgressEvent>>` for SSE progress channels
@ -298,6 +300,7 @@ Longer explanation if needed.
Types: `feat`, `fix`, `docs`, `refactor`, `test`, `chore`. Types: `feat`, `fix`, `docs`, `refactor`, `test`, `chore`.
Examples from the repo: Examples from the repo:
- `fix: rewrite pass schema uses actual scraped item counts, not max setting` - `fix: rewrite pass schema uses actual scraped item counts, not max setting`
- `fix: filter empty scraped articles + restore URLs after rewrite + E2E assertions` - `fix: filter empty scraped articles + restore URLs after rewrite + E2E assertions`
- `docs: add spec and plan for source priority pipeline redesign` - `docs: add spec and plan for source priority pipeline redesign`
@ -329,6 +332,7 @@ The `PUT /settings` endpoint requires the **complete** settings object, not a pa
### Pipeline Test Requirements ### Pipeline Test Requirements
Pipeline integration tests require: Pipeline integration tests require:
- A running Postgres instance (via `TEST_DATABASE_URL`) - A running Postgres instance (via `TEST_DATABASE_URL`)
- `SKIP_SSRF_CHECK=1` (to allow wiremock on localhost) - `SKIP_SSRF_CHECK=1` (to allow wiremock on localhost)
- Wiremock for mocking HTTP responses from source websites - Wiremock for mocking HTTP responses from source websites

@ -66,6 +66,7 @@
### 1.7 Export a Synthesis ### 1.7 Export a Synthesis
From the synthesis detail page: From the synthesis detail page:
- **Email**: enter a recipient address or click "S'envoyer a soi-meme". The synthesis is sent as a formatted email via Resend. - **Email**: enter a recipient address or click "S'envoyer a soi-meme". The synthesis is sent as a formatted email via Resend.
- **Markdown**: download as a `.md` file. - **Markdown**: download as a `.md` file.
- **PDF**: download as a `.pdf` file. - **PDF**: download as a `.pdf` file.
@ -75,6 +76,7 @@ From the synthesis detail page:
### 2.1 Multi-Theme ### 2.1 Multi-Theme
Each user can create multiple themes. A theme groups together: Each user can create multiple themes. A theme groups together:
- Content settings (search topic, categories, max items, max age, summary length) - Content settings (search topic, categories, max items, max age, summary length)
- Personalized sources - Personalized sources
- Generated syntheses - Generated syntheses
@ -88,6 +90,7 @@ The generate page requires selecting a theme before launching. The home page sho
Categories are user-defined per theme. Users add and remove category names in the theme editor after creating a theme. Categories are user-defined per theme. Users add and remove category names in the theme editor after creating a theme.
The system always includes two default categories: The system always includes two default categories:
- `Divers`: overflow category for unmatched or full categories. - `Divers`: overflow category for unmatched or full categories.
- `Sans date`: category for articles without a usable publication date. - `Sans date`: category for articles without a usable publication date.
@ -100,6 +103,7 @@ Sources can be marked as preferred. Preference is stored per theme. During gener
### 2.4 Scheduled Generation ### 2.4 Scheduled Generation
Each theme can have an optional schedule with: Each theme can have an optional schedule with:
- **Enabled/disabled toggle** - **Enabled/disabled toggle**
- **Days**: selection of days of the week (Mon-Sun) - **Days**: selection of days of the week (Mon-Sun)
- **Time**: execution time in UTC (HH:MM) - **Time**: execution time in UTC (HH:MM)
@ -112,6 +116,7 @@ Changes to the schedule are saved immediately (auto-save).
### 2.5 Brave Search ### 2.5 Brave Search
An optional alternative to LLM-powered web search in Phase 2. When enabled: An optional alternative to LLM-powered web search in Phase 2. When enabled:
- The user provides a Brave Search API key (stored encrypted alongside LLM keys). - The user provides a Brave Search API key (stored encrypted alongside LLM keys).
- Phase 2 queries the Brave Search API with the theme topic, filtered by article freshness. - Phase 2 queries the Brave Search API with the theme topic, filtered by article freshness.
- Results are scraped and classified/summarized by the LLM, following the same pipeline as Phase 1. - Results are scraped and classified/summarized by the LLM, following the same pipeline as Phase 1.
@ -127,6 +132,7 @@ Generation follows a two-phase pipeline. Phase 1 processes the user's personaliz
### 3.2 Initialization ### 3.2 Initialization
Before generation starts: Before generation starts:
1. Load theme settings (user-defined categories plus defaults `Divers` and `Sans date`, search topic, max items, max age, summary length) and global user settings (provider, models, batch size, rate limits, etc.). 1. Load theme settings (user-defined categories plus defaults `Divers` and `Sans date`, search topic, max items, max age, summary length) and global user settings (provider, models, batch size, rate limits, etc.).
2. Decrypt the user's LLM API key and create the provider instance. 2. Decrypt the user's LLM API key and create the provider instance.
3. Clean up old article history and LLM call logs. 3. Clean up old article history and LLM call logs.
@ -141,6 +147,7 @@ Skipped if the user has no sources for the theme.
Sources are split into waves of `source_extraction_window` size (default 3). Sources are rotated so extraction starts after the last source used in a previous generation (rolling window). Preferred sources are placed before non-preferred sources within the rotation order. Sources are split into waves of `source_extraction_window` size (default 3). Sources are rotated so extraction starts after the last source used in a previous generation (rolling window). Preferred sources are placed before non-preferred sources within the rotation order.
For each wave: For each wave:
1. Extract article links from all sources in the wave in parallel (bounded concurrency of 5). Link extraction uses HTML `<a>` tag parsing. 1. Extract article links from all sources in the wave in parallel (bounded concurrency of 5). Link extraction uses HTML `<a>` tag parsing.
2. Deduplicate candidate URLs and filter against article history (previously seen articles are skipped). 2. Deduplicate candidate URLs and filter against article history (previously seen articles are skipped).
3. Shuffle remaining candidates, with URLs from preferred sources placed first. 3. Shuffle remaining candidates, with URLs from preferred sources placed first.
@ -158,11 +165,13 @@ Skipped if all user-defined categories are already filled.
The system computes category gaps (how many articles each category still needs), then follows one of two paths: The system computes category gaps (how many articles each category still needs), then follows one of two paths:
**Path A -- Brave Search** (when `use_brave_search` is enabled): **Path A -- Brave Search** (when `use_brave_search` is enabled):
1. Query the Brave Search API with the theme topic and freshness filter. 1. Query the Brave Search API with the theme topic and freshness filter.
2. Filter results: reject homepage URLs, deduplicate against Phase 1, check article history, apply source diversity cap. 2. Filter results: reject homepage URLs, deduplicate against Phase 1, check article history, apply source diversity cap.
3. Scrape and classify/summarize results using the same batched pipeline as Phase 1. 3. Scrape and classify/summarize results using the same batched pipeline as Phase 1.
**Path B -- LLM Web Search** (default): **Path B -- LLM Web Search** (default):
1. Send a search prompt to the LLM with the theme, categories, and gap counts. The LLM uses web grounding to find articles and returns structured results. 1. Send a search prompt to the LLM with the theme, categories, and gap counts. The LLM uses web grounding to find articles and returns structured results.
2. Filter results using the same filters as Path A. 2. Filter results using the same filters as Path A.
3. Scrape each result to validate it. Keep the LLM-provided title and summary (no re-classification). 3. Scrape each result to validate it. Keep the LLM-provided title and summary (no re-classification).
@ -183,7 +192,7 @@ For the complete technical algorithm, see `technical_specs.md` Section 5.
Managed on the theme management page. Each theme has its own values. Managed on the theme management page. Each theme has its own values.
| Setting | Description | Default | | Setting | Description | Default |
|---------|-------------|---------| | --------- | ------------- | --------- |
| Name | Display label for the theme | -- | | Name | Display label for the theme | -- |
| Search topic | Subject for AI search queries | -- | | Search topic | Subject for AI search queries | -- |
| Categories | Ordered list of user-defined category names (`Divers` and `Sans date` are always included by the system) | [] | | Categories | Ordered list of user-defined category names (`Divers` and `Sans date` are always included by the system) | [] |
@ -196,7 +205,7 @@ Managed on the theme management page. Each theme has its own values.
Managed on the settings page. Apply across all themes. Managed on the settings page. Apply across all themes.
| Setting | Description | Default | | Setting | Description | Default |
|---------|-------------|---------| | --------- | ------------- | --------- |
| Provider | LLM provider (Gemini, OpenAI, Anthropic) | -- | | Provider | LLM provider (Gemini, OpenAI, Anthropic) | -- |
| Research model | Model for scraping/classification | Admin default | | Research model | Model for scraping/classification | Admin default |
| Web search model | Model for web search | Admin default | | Web search model | Model for web search | Admin default |
@ -219,6 +228,7 @@ Users can export their global settings as a JSON file and import settings from a
### 5.1 Provider Management ### 5.1 Provider Management
Admins configure which LLM providers and models are available to users: Admins configure which LLM providers and models are available to users:
- Add providers with a unique identifier and display name. - Add providers with a unique identifier and display name.
- For each provider, configure two model lists: scraping/extraction models and web search models. - For each provider, configure two model lists: scraping/extraction models and web search models.
- Set a default model for each category. - Set a default model for each category.
@ -234,6 +244,7 @@ Admins set default rate limits per provider (max requests / time window in secon
### 5.3 User Management ### 5.3 User Management
Admins can: Admins can:
- View all registered users (email, name, role, registration date). - View all registered users (email, name, role, registration date).
- Promote a user to admin or demote an admin to user. - Promote a user to admin or demote an admin to user.
- Admins cannot modify their own role. - Admins cannot modify their own role.
@ -259,6 +270,7 @@ A Markdown export is available from the synthesis detail page. The file can be s
### 7.1 Article History ### 7.1 Article History
Every article encountered during generation is recorded in the article history with its status: Every article encountered during generation is recorded in the article history with its status:
- **used**: included in the final synthesis. - **used**: included in the final synthesis.
- **filtered_history**: skipped because it was seen in a previous generation. - **filtered_history**: skipped because it was seen in a previous generation.
- **filtered_diversity**: skipped due to per-domain cap. - **filtered_diversity**: skipped due to per-domain cap.
@ -272,6 +284,7 @@ Users can view the article history per synthesis (provenance view) or globally.
### 7.2 LLM Call Logs ### 7.2 LLM Call Logs
Every LLM call during generation is logged with: Every LLM call during generation is logged with:
- Call type (link extraction, classify/summarize, web search) - Call type (link extraction, classify/summarize, web search)
- Model used - Model used
- System prompt and user prompt - System prompt and user prompt

@ -3,7 +3,7 @@
## Test Inventory ## Test Inventory
| Type | Count | Status | Location | | Type | Count | Status | Location |
|------|-------|--------|----------| | ------ | ------- | -------- | ---------- |
| Backend unit tests | 358 | All passing | `backend/src/**/*.rs` (inline `#[cfg(test)]`) | | Backend unit tests | 358 | All passing | `backend/src/**/*.rs` (inline `#[cfg(test)]`) |
| Backend integration tests | 183 | All passing | `backend/tests/*.rs` | | Backend integration tests | 183 | All passing | `backend/tests/*.rs` |
| Frontend unit tests | 141 | 131 passing, 10 failing | `frontend/src/**/*.test.{ts,tsx}` | | Frontend unit tests | 141 | 131 passing, 10 failing | `frontend/src/**/*.test.{ts,tsx}` |
@ -21,7 +21,7 @@
### Backend Unit Test Breakdown ### Backend Unit Test Breakdown
| Source file | Tests | Coverage area | | Source file | Tests | Coverage area |
|---|---|---| | --- | --- | --- |
| `services/scraper.rs` | 74 | SSRF IP checks, soft-404, redirect, HTML parsing | | `services/scraper.rs` | 74 | SSRF IP checks, soft-404, redirect, HTML parsing |
| `services/synthesis.rs` | 36 | Pipeline logic, schema building, category overflow | | `services/synthesis.rs` | 36 | Pipeline logic, schema building, category overflow |
| `services/llm/anthropic.rs` | 20 | Response parsing, error handling | | `services/llm/anthropic.rs` | 20 | Response parsing, error handling |
@ -51,7 +51,7 @@
### Backend Integration Test Breakdown ### Backend Integration Test Breakdown
| File | Tests | Coverage area | | File | Tests | Coverage area |
|---|---|---| | --- | --- | --- |
| `api_sources_test.rs` | 36 | Sources CRUD, validation, CSV, bulk import, max limit | | `api_sources_test.rs` | 36 | Sources CRUD, validation, CSV, bulk import, max limit |
| `api_admin_test.rs` | 30 | Provider CRUD, rate limits, user management, audit log | | `api_admin_test.rs` | 30 | Provider CRUD, rate limits, user management, audit log |
| `api_keys_test.rs` | 18 | API key CRUD, encryption, ownership, test endpoint | | `api_keys_test.rs` | 18 | API key CRUD, encryption, ownership, test endpoint |
@ -73,7 +73,7 @@
### E2E Test Breakdown ### E2E Test Breakdown
| File | Coverage area | | File | Coverage area |
|---|---| | --- | --- |
| `registration.spec.ts` | Full magic link registration flow | | `registration.spec.ts` | Full magic link registration flow |
| `settings.spec.ts` | Settings persistence across reloads | | `settings.spec.ts` | Settings persistence across reloads |
| `settings-export.spec.ts` | Settings export/import roundtrip | | `settings-export.spec.ts` | Settings export/import roundtrip |
@ -107,6 +107,7 @@ Requires a running Postgres instance. Use the helper script:
``` ```
The script automatically: The script automatically:
- Starts the test Postgres container on port 5433 (via `e2e/docker-compose.test.yml`) - Starts the test Postgres container on port 5433 (via `e2e/docker-compose.test.yml`)
- Sets `TEST_DATABASE_URL` and `SKIP_SSRF_CHECK=1` - Sets `TEST_DATABASE_URL` and `SKIP_SSRF_CHECK=1`
- Runs `cargo test` with the specified arguments - Runs `cargo test` with the specified arguments
@ -144,6 +145,7 @@ Use the helper script, which builds the Docker image, starts the full stack, see
``` ```
The script: The script:
1. Builds the test Docker image (`docker compose -f docker-compose.test.yml build`) 1. Builds the test Docker image (`docker compose -f docker-compose.test.yml build`)
2. Starts the full stack (app + Postgres) 2. Starts the full stack (app + Postgres)
3. Waits for the app health check to pass 3. Waits for the app health check to pass
@ -163,6 +165,7 @@ The `generation-live.spec.ts` test requires `OPENAI_TEST_API_KEY` to be set (in
`backend/tests/common/mod.rs` provides the `TestApp` struct, which is the foundation for all integration tests. `backend/tests/common/mod.rs` provides the `TestApp` struct, which is the foundation for all integration tests.
**What it does:** **What it does:**
- Creates a unique temporary Postgres database per test (named `ai_synth_test_{uuid}`) - Creates a unique temporary Postgres database per test (named `ai_synth_test_{uuid}`)
- Runs all migrations - Runs all migrations
- Builds the full Axum router with test configuration (bypassed Turnstile and Resend) - Builds the full Axum router with test configuration (bypassed Turnstile and Resend)
@ -172,6 +175,7 @@ The `generation-live.spec.ts` test requires `OPENAI_TEST_API_KEY` to be set (in
- Handles cleanup via `Drop` (fire-and-forget) or explicit `cleanup().await` - Handles cleanup via `Drop` (fire-and-forget) or explicit `cleanup().await`
**Request helpers** automatically: **Request helpers** automatically:
- Set `Content-Type: application/json` for requests with a body - Set `Content-Type: application/json` for requests with a body
- Set `X-Requested-With: XMLHttpRequest` (CSRF header) for mutating methods (POST, PUT, DELETE, PATCH) - Set `X-Requested-With: XMLHttpRequest` (CSRF header) for mutating methods (POST, PUT, DELETE, PATCH)
- Set the session cookie when `session_cookie` is provided - Set the session cookie when `session_cookie` is provided
@ -347,6 +351,7 @@ The `TestApp::Drop` implementation spawns a background thread to drop the test d
### Flaky generation-live Test ### Flaky generation-live Test
The `generation-live.spec.ts` test depends on a real OpenAI API call. It may fail due to: The `generation-live.spec.ts` test depends on a real OpenAI API call. It may fail due to:
- API rate limits - API rate limits
- Slow responses exceeding the 30-second timeout - Slow responses exceeding the 30-second timeout
- Changes in model behavior affecting output format - Changes in model behavior affecting output format
@ -374,7 +379,7 @@ As of the last audit, 10 of 141 frontend unit tests are failing. Investigate wit
The following gaps must be addressed to satisfy the release gate policy. The following gaps must be addressed to satisfy the release gate policy.
| Gap | Priority | Description | | Gap | Priority | Description |
|-----|----------|-------------| | ----- | ---------- | ------------- |
| Scheduled execution | Critical | `scheduler.rs` has zero tests. Autonomous process that generates syntheses and sends emails. | | Scheduled execution | Critical | `scheduler.rs` has zero tests. Autonomous process that generates syntheses and sends emails. |
| Brave Search pipeline | High | Only 1 unit test. The Brave Search code path in the pipeline is untested in integration. | | Brave Search pipeline | High | Only 1 unit test. The Brave Search code path in the pipeline is untested in integration. |
| Date filtering | High | No tests verify that `max_age_days` actually filters old articles. | | Date filtering | High | No tests verify that `max_age_days` actually filters old articles. |

@ -3,7 +3,7 @@
## 1. Backend Tech Stack ## 1. Backend Tech Stack
| Dependency | Version | Purpose | | Dependency | Version | Purpose |
|---|---|---| | --- | --- | --- |
| axum | 0.8 | Web framework (macros, multipart) | | axum | 0.8 | Web framework (macros, multipart) |
| tokio | 1 | Async runtime (full features) | | tokio | 1 | Async runtime (full features) |
| tower | 0.5 | Middleware composition | | tower | 0.5 | Middleware composition |
@ -43,7 +43,7 @@
## 2. Frontend Tech Stack ## 2. Frontend Tech Stack
| Dependency | Version | Purpose | | Dependency | Version | Purpose |
|---|---|---| | --- | --- | --- |
| solid-js | ^1.9.0 | Reactive UI framework | | solid-js | ^1.9.0 | Reactive UI framework |
| @solidjs/router | ^0.15.0 | Client-side routing | | @solidjs/router | ^0.15.0 | Client-side routing |
| lucide-solid | ^0.475.0 | Icon library | | lucide-solid | ^0.475.0 | Icon library |
@ -60,7 +60,7 @@
### Frontend Routes ### Frontend Routes
| Path | Component | Auth | Description | | Path | Component | Auth | Description |
|---|---|---|---| | --- | --- | --- | --- |
| /login | Login | Public | Login page | | /login | Login | Public | Login page |
| /register | Register | Public | Registration page | | /register | Register | Public | Registration page |
| /auth/verify | AuthVerify | Public | Magic link verification | | /auth/verify | AuthVerify | Public | Magic link verification |
@ -82,7 +82,7 @@
### 3.1 `users` ### 3.1 `users`
| Column | Type | Constraints | | Column | Type | Constraints |
|---|---|---| | --- | --- | --- |
| id | UUID | PK, DEFAULT gen_random_uuid() | | id | UUID | PK, DEFAULT gen_random_uuid() |
| email | TEXT | NOT NULL, UNIQUE | | email | TEXT | NOT NULL, UNIQUE |
| display_name | TEXT | nullable | | display_name | TEXT | nullable |
@ -95,7 +95,7 @@ Indexes: `idx_users_email` on (email).
### 3.2 `sessions` ### 3.2 `sessions`
| Column | Type | Constraints | | Column | Type | Constraints |
|---|---|---| | --- | --- | --- |
| session_hash | TEXT | PK (SHA-256 of raw token) | | session_hash | TEXT | PK (SHA-256 of raw token) |
| user_id | UUID | NOT NULL, FK users(id) CASCADE | | user_id | UUID | NOT NULL, FK users(id) CASCADE |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() | | created_at | TIMESTAMPTZ | NOT NULL, DEFAULT now() |
@ -109,7 +109,7 @@ Indexes: `idx_sessions_user_id`, `idx_sessions_expires_at`.
### 3.3 `magic_tokens` ### 3.3 `magic_tokens`
| Column | Type | Constraints | | Column | Type | Constraints |
|---|---|---| | --- | --- | --- |
| id | UUID | PK, DEFAULT gen_random_uuid() | | id | UUID | PK, DEFAULT gen_random_uuid() |
| email | TEXT | NOT NULL | | email | TEXT | NOT NULL |
| token_hash | TEXT | NOT NULL, UNIQUE | | token_hash | TEXT | NOT NULL, UNIQUE |
@ -124,7 +124,7 @@ Indexes: `idx_magic_tokens_email`, `idx_magic_tokens_expires`.
Per-user pipeline configuration. One row per user (user_id is the PK). Per-user pipeline configuration. One row per user (user_id is the PK).
| Column | Type | Constraints | | Column | Type | Constraints |
|---|---|---| | --- | --- | --- |
| user_id | UUID | PK, FK users(id) CASCADE | | user_id | UUID | PK, FK users(id) CASCADE |
| max_articles_per_source | INTEGER | NOT NULL, DEFAULT 3 | | max_articles_per_source | INTEGER | NOT NULL, DEFAULT 3 |
| max_links_per_source | INTEGER | NOT NULL, DEFAULT 8 | | max_links_per_source | INTEGER | NOT NULL, DEFAULT 8 |
@ -145,7 +145,7 @@ Per-user pipeline configuration. One row per user (user_id is the PK).
Per-user topic configurations with content settings. Per-user topic configurations with content settings.
| Column | Type | Constraints | | Column | Type | Constraints |
|---|---|---| | --- | --- | --- |
| id | UUID | PK, DEFAULT gen_random_uuid() | | id | UUID | PK, DEFAULT gen_random_uuid() |
| user_id | UUID | NOT NULL, FK users(id) CASCADE | | user_id | UUID | NOT NULL, FK users(id) CASCADE |
| name | TEXT | NOT NULL | | name | TEXT | NOT NULL |
@ -166,7 +166,7 @@ Indexes: `idx_themes_user_id`.
User-curated news source URLs, always tied to a theme. User-curated news source URLs, always tied to a theme.
| Column | Type | Constraints | | Column | Type | Constraints |
|---|---|---| | --- | --- | --- |
| id | UUID | PK, DEFAULT gen_random_uuid() | | id | UUID | PK, DEFAULT gen_random_uuid() |
| user_id | UUID | NOT NULL, FK users(id) CASCADE | | user_id | UUID | NOT NULL, FK users(id) CASCADE |
| title | VARCHAR(200) | NOT NULL, CHECK length 1-200 | | title | VARCHAR(200) | NOT NULL, CHECK length 1-200 |
@ -182,7 +182,7 @@ Indexes: `idx_sources_user_id`, UNIQUE `idx_sources_user_id_url` on (user_id, ur
Generated synthesis results with JSONB section data. Generated synthesis results with JSONB section data.
| Column | Type | Constraints | | Column | Type | Constraints |
|---|---|---| | --- | --- | --- |
| id | UUID | PK, DEFAULT gen_random_uuid() | | id | UUID | PK, DEFAULT gen_random_uuid() |
| user_id | UUID | NOT NULL, FK users(id) CASCADE | | user_id | UUID | NOT NULL, FK users(id) CASCADE |
| week | VARCHAR(10) | NOT NULL (ISO week string) | | week | VARCHAR(10) | NOT NULL (ISO week string) |
@ -195,6 +195,7 @@ Generated synthesis results with JSONB section data.
Indexes: `idx_syntheses_user_id_created_at` on (user_id, created_at DESC). Indexes: `idx_syntheses_user_id_created_at` on (user_id, created_at DESC).
JSONB structure for `sections`: JSONB structure for `sections`:
```json ```json
[ [
{ {
@ -211,7 +212,7 @@ JSONB structure for `sections`:
Automated generation schedules, one per theme. Automated generation schedules, one per theme.
| Column | Type | Constraints | | Column | Type | Constraints |
|---|---|---| | --- | --- | --- |
| id | UUID | PK, DEFAULT gen_random_uuid() | | id | UUID | PK, DEFAULT gen_random_uuid() |
| theme_id | UUID | NOT NULL, UNIQUE, FK themes(id) CASCADE | | theme_id | UUID | NOT NULL, UNIQUE, FK themes(id) CASCADE |
| user_id | UUID | NOT NULL, FK users(id) CASCADE | | user_id | UUID | NOT NULL, FK users(id) CASCADE |
@ -230,7 +231,7 @@ Indexes: `idx_theme_schedules_enabled` (partial, WHERE enabled = true).
Article URL deduplication and full provenance tracing. Article URL deduplication and full provenance tracing.
| Column | Type | Constraints | | Column | Type | Constraints |
|---|---|---| | --- | --- | --- |
| id | UUID | PK, DEFAULT gen_random_uuid() | | id | UUID | PK, DEFAULT gen_random_uuid() |
| user_id | UUID | NOT NULL, FK users(id) CASCADE | | user_id | UUID | NOT NULL, FK users(id) CASCADE |
| url_hash | TEXT | NOT NULL (SHA-256 of normalized URL) | | url_hash | TEXT | NOT NULL (SHA-256 of normalized URL) |
@ -257,7 +258,7 @@ Source type values: `personalized_source`, `brave_search`, `web_search`.
Full LLM interaction logging for debugging and analysis. Full LLM interaction logging for debugging and analysis.
| Column | Type | Constraints | | Column | Type | Constraints |
|---|---|---| | --- | --- | --- |
| id | UUID | PK, DEFAULT gen_random_uuid() | | id | UUID | PK, DEFAULT gen_random_uuid() |
| user_id | UUID | NOT NULL, FK users(id) CASCADE | | user_id | UUID | NOT NULL, FK users(id) CASCADE |
| job_id | UUID | NOT NULL | | job_id | UUID | NOT NULL |
@ -277,7 +278,7 @@ Indexes: `idx_llm_call_log_job_id`, `idx_llm_call_log_user_id` on (user_id, crea
Admin-curated catalog of LLM providers and their models. Admin-curated catalog of LLM providers and their models.
| Column | Type | Constraints | | Column | Type | Constraints |
|---|---|---| | --- | --- | --- |
| id | UUID | PK, DEFAULT gen_random_uuid() | | id | UUID | PK, DEFAULT gen_random_uuid() |
| provider_name | VARCHAR(50) | NOT NULL, UNIQUE | | provider_name | VARCHAR(50) | NOT NULL, UNIQUE |
| display_name | VARCHAR(100) | NOT NULL | | display_name | VARCHAR(100) | NOT NULL |
@ -292,6 +293,7 @@ Indexes: `idx_admin_providers_enabled` (partial, WHERE is_enabled = true).
Seeded with: gemini, openai, anthropic. Seeded with: gemini, openai, anthropic.
JSONB model structure: JSONB model structure:
```json ```json
[{"model_id": "gemini-2.5-pro", "display_name": "Gemini 2.5 Pro", "is_default": true}] [{"model_id": "gemini-2.5-pro", "display_name": "Gemini 2.5 Pro", "is_default": true}]
``` ```
@ -301,7 +303,7 @@ JSONB model structure:
Per-provider rate limit configuration. Per-provider rate limit configuration.
| Column | Type | Constraints | | Column | Type | Constraints |
|---|---|---| | --- | --- | --- |
| id | UUID | PK, DEFAULT gen_random_uuid() | | id | UUID | PK, DEFAULT gen_random_uuid() |
| provider_name | VARCHAR(50) | NOT NULL, UNIQUE, FK admin_providers(provider_name) CASCADE | | provider_name | VARCHAR(50) | NOT NULL, UNIQUE, FK admin_providers(provider_name) CASCADE |
| max_requests | INTEGER | NOT NULL, DEFAULT 30 | | max_requests | INTEGER | NOT NULL, DEFAULT 30 |
@ -315,7 +317,7 @@ Seeded defaults: gemini 29/60s, openai 50/60s, anthropic 40/60s.
Encrypted user LLM API keys. Encrypted user LLM API keys.
| Column | Type | Constraints | | Column | Type | Constraints |
|---|---|---| | --- | --- | --- |
| id | UUID | PK, DEFAULT gen_random_uuid() | | id | UUID | PK, DEFAULT gen_random_uuid() |
| user_id | UUID | NOT NULL, FK users(id) CASCADE | | user_id | UUID | NOT NULL, FK users(id) CASCADE |
| provider_name | VARCHAR(50) | NOT NULL | | provider_name | VARCHAR(50) | NOT NULL |
@ -332,7 +334,7 @@ Constraint: UNIQUE(user_id, provider_name). Valid providers: gemini, openai, ant
Admin mutation audit trail. Admin mutation audit trail.
| Column | Type | Constraints | | Column | Type | Constraints |
|---|---|---| | --- | --- | --- |
| id | UUID | PK, DEFAULT gen_random_uuid() | | id | UUID | PK, DEFAULT gen_random_uuid() |
| admin_user_id | UUID | nullable, FK users(id) SET NULL | | admin_user_id | UUID | nullable, FK users(id) SET NULL |
| action | VARCHAR(100) | NOT NULL | | action | VARCHAR(100) | NOT NULL |
@ -352,43 +354,51 @@ All endpoints are prefixed with `/api/v1`. Responses are JSON. Errors follow the
### 4.1 Authentication ### 4.1 Authentication
**POST /auth/register** **POST /auth/register**
- Auth: Public - Auth: Public
- Body: `{ email: string, display_name?: string, turnstile_token: string }` - Body: `{ email: string, display_name?: string, turnstile_token: string }`
- Response: `{ message: string }` - Response: `{ message: string }`
- Sends magic link email. Rate limited. - Sends magic link email. Rate limited.
**POST /auth/login** **POST /auth/login**
- Auth: Public - Auth: Public
- Body: `{ email: string, turnstile_token: string }` - Body: `{ email: string, turnstile_token: string }`
- Response: `{ message: string }` - Response: `{ message: string }`
- Sends magic link email. Rate limited. - Sends magic link email. Rate limited.
**GET /auth/verify?token=...&email=...** **GET /auth/verify?token=...&email=...**
- Auth: Public - Auth: Public
- Response: Redirect to frontend with session cookie set. - Response: Redirect to frontend with session cookie set.
**POST /auth/verify** **POST /auth/verify**
- Auth: Public - Auth: Public
- Body: `{ token: string, email: string }` - Body: `{ token: string, email: string }`
- Response: `{ message: string, user: User }` - Response: `{ message: string, user: User }`
- Sets `session` HttpOnly cookie (30-day expiry). - Sets `session` HttpOnly cookie (30-day expiry).
**POST /auth/logout** **POST /auth/logout**
- Auth: Authenticated - Auth: Authenticated
- Response: `{ message: string }` - Response: `{ message: string }`
- Clears session cookie and deletes DB session. - Clears session cookie and deletes DB session.
**GET /auth/me** **GET /auth/me**
- Auth: Authenticated - Auth: Authenticated
- Response: `{ id, email, display_name, role, created_at }` - Response: `{ id, email, display_name, role, created_at }`
### 4.2 Settings ### 4.2 Settings
**GET /settings** **GET /settings**
- Auth: Authenticated - Auth: Authenticated
- Response: `UserSettings` (creates defaults if not exists) - Response: `UserSettings` (creates defaults if not exists)
**PUT /settings** **PUT /settings**
- Auth: Authenticated - Auth: Authenticated
- Body: `UpdateSettingsRequest` (all fields required) - Body: `UpdateSettingsRequest` (all fields required)
- Validation: max_articles_per_source 1-10, max_links_per_source 1-30, batch_size 1-20, source_extraction_window 1-10, article_history_days 0-365, search_agent_behavior max 2000 chars, ai_provider/ai_model/ai_model_websearch max 100 chars. - Validation: max_articles_per_source 1-10, max_links_per_source 1-30, batch_size 1-20, source_extraction_window 1-10, article_history_days 0-365, search_agent_behavior max 2000 chars, ai_provider/ai_model/ai_model_websearch max 100 chars.
@ -397,10 +407,12 @@ All endpoints are prefixed with `/api/v1`. Responses are JSON. Errors follow the
### 4.3 Themes ### 4.3 Themes
**GET /themes** **GET /themes**
- Auth: Authenticated - Auth: Authenticated
- Response: `ThemeResponse[]` - Response: `ThemeResponse[]`
**POST /themes** **POST /themes**
- Auth: Authenticated - Auth: Authenticated
- Body: `{ name, theme, categories: string[], max_items_per_category?, max_age_days?, summary_length? }` - Body: `{ name, theme, categories: string[], max_items_per_category?, max_age_days?, summary_length? }`
- Validation: name non-empty max 200 chars, categories 0-20 non-empty entries, max_items 1-50, max_age 1-365, summary_length 1-3. - Validation: name non-empty max 200 chars, categories 0-20 non-empty entries, max_items 1-50, max_age 1-365, summary_length 1-3.
@ -408,64 +420,76 @@ All endpoints are prefixed with `/api/v1`. Responses are JSON. Errors follow the
- Response: `ThemeResponse` - Response: `ThemeResponse`
**PUT /themes/{id}** **PUT /themes/{id}**
- Auth: Authenticated (owner only) - Auth: Authenticated (owner only)
- Body: `UpdateThemeRequest` (all fields optional) - Body: `UpdateThemeRequest` (all fields optional)
- Response: `ThemeResponse` - Response: `ThemeResponse`
**DELETE /themes/{id}** **DELETE /themes/{id}**
- Auth: Authenticated (owner only) - Auth: Authenticated (owner only)
- Response: 204 No Content - Response: 204 No Content
### 4.4 Schedules ### 4.4 Schedules
**GET /themes/{id}/schedule** **GET /themes/{id}/schedule**
- Auth: Authenticated (theme owner) - Auth: Authenticated (theme owner)
- Response: `ScheduleResponse | null` with HTTP 200 - Response: `ScheduleResponse | null` with HTTP 200
**PUT /themes/{id}/schedule** **PUT /themes/{id}/schedule**
- Auth: Authenticated (theme owner) - Auth: Authenticated (theme owner)
- Body: `{ enabled, days: string[], time_utc: "HH:MM", emails: string[] }` - Body: `{ enabled, days: string[], time_utc: "HH:MM", emails: string[] }`
- Validation: days from mon-sun, time HH:MM format, max 3 emails. - Validation: days from mon-sun, time HH:MM format, max 3 emails.
- Response: `ScheduleResponse` - Response: `ScheduleResponse`
**DELETE /themes/{id}/schedule** **DELETE /themes/{id}/schedule**
- Auth: Authenticated (theme owner) - Auth: Authenticated (theme owner)
- Response: 204 No Content - Response: 204 No Content
### 4.5 Sources ### 4.5 Sources
**GET /sources?theme_id=...** **GET /sources?theme_id=...**
- Auth: Authenticated - Auth: Authenticated
- Query: `theme_id` is required - Query: `theme_id` is required
- Response: `SourceResponse[]` - Response: `SourceResponse[]`
**POST /sources** **POST /sources**
- Auth: Authenticated - Auth: Authenticated
- Body: `{ title, url, theme_id }` - Body: `{ title, url, theme_id }`
- Validation: title non-empty max 200, URL http(s) max 1000 chars. - Validation: title non-empty max 200, URL http(s) max 1000 chars.
- Response: `SourceResponse` - Response: `SourceResponse`
**PUT /sources/preferred** **PUT /sources/preferred**
- Auth: Authenticated - Auth: Authenticated
- Body: `{ theme_id: UUID, source_ids: UUID[] }` - Body: `{ theme_id: UUID, source_ids: UUID[] }`
- Note: preferred state is scoped per theme. - Note: preferred state is scoped per theme.
- Response: `{ updated: number }` - Response: `{ updated: number }`
**DELETE /sources/{id}** **DELETE /sources/{id}**
- Auth: Authenticated (owner only) - Auth: Authenticated (owner only)
- Response: 204 No Content - Response: 204 No Content
**POST /sources/bulk** **POST /sources/bulk**
- Auth: Authenticated - Auth: Authenticated
- Body: `{ sources: CreateSourceRequest[], theme_id: UUID }` - Body: `{ sources: CreateSourceRequest[], theme_id: UUID }`
- Response: `{ imported, skipped, errors }` - Response: `{ imported, skipped, errors }`
**POST /sources/import-csv** **POST /sources/import-csv**
- Auth: Authenticated - Auth: Authenticated
- Body: Multipart file upload (CSV: title,url) + required `theme_id` - Body: Multipart file upload (CSV: title,url) + required `theme_id`
- Response: `{ imported, skipped, errors }` - Response: `{ imported, skipped, errors }`
**GET /sources/export-csv** **GET /sources/export-csv**
- Auth: Authenticated - Auth: Authenticated
- Query: `theme_id` is required - Query: `theme_id` is required
- Scope: exports sources for the selected theme only - Scope: exports sources for the selected theme only
@ -474,17 +498,20 @@ All endpoints are prefixed with `/api/v1`. Responses are JSON. Errors follow the
### 4.6 Generation ### 4.6 Generation
**POST /syntheses/generate** **POST /syntheses/generate**
- Auth: Authenticated - Auth: Authenticated
- Body: `{ theme_id: UUID }` - Body: `{ theme_id: UUID }`
- Response: `{ job_id: UUID }` - Response: `{ job_id: UUID }`
- Creates job in JobStore, spawns background generation task. Returns 409 if user already has active job. - Creates job in JobStore, spawns background generation task. Returns 409 if user already has active job.
**GET /syntheses/generate/{job_id}/progress** **GET /syntheses/generate/{job_id}/progress**
- Auth: Authenticated (job owner) - Auth: Authenticated (job owner)
- Response: SSE stream of `ProgressEvent` - Response: SSE stream of `ProgressEvent`
- Events: `progress` (step, message, percent), `complete` (synthesis_id), `error` (message). - Events: `progress` (step, message, percent), `complete` (synthesis_id), `error` (message).
**POST /syntheses/generate/{job_id}/stop** **POST /syntheses/generate/{job_id}/stop**
- Auth: Authenticated (job owner) - Auth: Authenticated (job owner)
- Response: `{ message: string }` - Response: `{ message: string }`
- Sets cooperative cancellation flag. - Sets cooperative cancellation flag.
@ -492,57 +519,69 @@ All endpoints are prefixed with `/api/v1`. Responses are JSON. Errors follow the
### 4.7 Syntheses ### 4.7 Syntheses
**GET /syntheses** **GET /syntheses**
- Auth: Authenticated - Auth: Authenticated
- Response: `SynthesisListItem[]` (with section summaries, theme info) - Response: `SynthesisListItem[]` (with section summaries, theme info)
**GET /syntheses/{id}** **GET /syntheses/{id}**
- Auth: Authenticated (owner only) - Auth: Authenticated (owner only)
- Response: `SynthesisResponse` (full sections data) - Response: `SynthesisResponse` (full sections data)
**DELETE /syntheses/{id}** **DELETE /syntheses/{id}**
- Auth: Authenticated (owner only) - Auth: Authenticated (owner only)
- Response: 204 No Content - Response: 204 No Content
**POST /syntheses/{id}/send-email** **POST /syntheses/{id}/send-email**
- Auth: Authenticated - Auth: Authenticated
- Body: `{ email: string }` - Body: `{ email: string }`
- Response: `{ message: string }` - Response: `{ message: string }`
**GET /syntheses/{id}/export/markdown** **GET /syntheses/{id}/export/markdown**
- Auth: Authenticated - Auth: Authenticated
- Response: Markdown file download - Response: Markdown file download
**GET /syntheses/{id}/export/pdf** **GET /syntheses/{id}/export/pdf**
- Auth: Authenticated - Auth: Authenticated
- Response: PDF file download - Response: PDF file download
### 4.8 Article History & Provenance ### 4.8 Article History & Provenance
**GET /article-history?limit=&offset=&job_id=&status=** **GET /article-history?limit=&offset=&job_id=&status=**
- Auth: Authenticated - Auth: Authenticated
- Response: `{ items: ArticleHistoryEntry[], total: number }` - Response: `{ items: ArticleHistoryEntry[], total: number }`
**DELETE /article-history** **DELETE /article-history**
- Auth: Authenticated - Auth: Authenticated
- Response: `{ deleted: number }` - Response: `{ deleted: number }`
**GET /syntheses/{id}/provenance** **GET /syntheses/{id}/provenance**
- Auth: Authenticated - Auth: Authenticated
- Response: `ArticleHistoryEntry[]` (articles with status "used" for this synthesis's job_id) - Response: `ArticleHistoryEntry[]` (articles with status "used" for this synthesis's job_id)
### 4.9 LLM Call Logs ### 4.9 LLM Call Logs
**GET /llm-logs/{job_id}** **GET /llm-logs/{job_id}**
- Auth: Authenticated - Auth: Authenticated
- Response: `LlmCallLogEntry[]` - Response: `LlmCallLogEntry[]`
### 4.10 User API Keys ### 4.10 User API Keys
**GET /user/api-keys** **GET /user/api-keys**
- Auth: Authenticated - Auth: Authenticated
- Response: `ApiKeyResponse[]` (id, provider_name, key_prefix, timestamps; never the full key) - Response: `ApiKeyResponse[]` (id, provider_name, key_prefix, timestamps; never the full key)
**POST /user/api-keys** **POST /user/api-keys**
- Auth: Authenticated - Auth: Authenticated
- Body: `{ provider_name, api_key }` - Body: `{ provider_name, api_key }`
- Validation: provider in (gemini, openai, anthropic, brave_search), key 8-500 chars. - Validation: provider in (gemini, openai, anthropic, brave_search), key 8-500 chars.
@ -550,15 +589,18 @@ All endpoints are prefixed with `/api/v1`. Responses are JSON. Errors follow the
- Encrypts key with AES-256-GCM before storage; upserts (one key per user per provider). - Encrypts key with AES-256-GCM before storage; upserts (one key per user per provider).
**DELETE /user/api-keys/{provider}** **DELETE /user/api-keys/{provider}**
- Auth: Authenticated - Auth: Authenticated
- Response: 204 No Content - Response: 204 No Content
**POST /user/api-keys/{provider}/test** **POST /user/api-keys/{provider}/test**
- Auth: Authenticated - Auth: Authenticated
- Response: `{ success: boolean, message: string }` - Response: `{ success: boolean, message: string }`
- Decrypts key, calls provider test endpoint. - Decrypts key, calls provider test endpoint.
**POST /user/api-keys/export** **POST /user/api-keys/export**
- Auth: Authenticated - Auth: Authenticated
- Response: `{ keys: [{ provider_name, api_key }] }` - Response: `{ keys: [{ provider_name, api_key }] }`
- Decrypts and returns all keys (used for backup/migration). - Decrypts and returns all keys (used for backup/migration).
@ -566,6 +608,7 @@ All endpoints are prefixed with `/api/v1`. Responses are JSON. Errors follow the
### 4.11 Public Configuration ### 4.11 Public Configuration
**GET /config/providers** **GET /config/providers**
- Auth: Authenticated - Auth: Authenticated
- Response: `ProviderConfigResponse[]` (enabled providers with model lists for scraping and websearch) - Response: `ProviderConfigResponse[]` (enabled providers with model lists for scraping and websearch)
@ -574,36 +617,45 @@ All endpoints are prefixed with `/api/v1`. Responses are JSON. Errors follow the
All admin endpoints require `AdminUser` extractor (role = admin). All admin endpoints require `AdminUser` extractor (role = admin).
**GET /admin/providers** **GET /admin/providers**
- Response: `AdminProviderResponse[]` - Response: `AdminProviderResponse[]`
**POST /admin/providers** **POST /admin/providers**
- Body: `CreateProviderRequest` - Body: `CreateProviderRequest`
- Validation: provider_name in (gemini, openai, anthropic), at least one model per list, at most one default per list. - Validation: provider_name in (gemini, openai, anthropic), at least one model per list, at most one default per list.
- Response: `AdminProviderResponse` - Response: `AdminProviderResponse`
**PUT /admin/providers/{id}** **PUT /admin/providers/{id}**
- Body: `UpdateProviderRequest` (all fields optional) - Body: `UpdateProviderRequest` (all fields optional)
- Response: `AdminProviderResponse` - Response: `AdminProviderResponse`
**DELETE /admin/providers/{id}** **DELETE /admin/providers/{id}**
- Response: 204 No Content - Response: 204 No Content
**GET /admin/rate-limits** **GET /admin/rate-limits**
- Response: `RateLimitResponse[]` - Response: `RateLimitResponse[]`
**PUT /admin/rate-limits/{provider_name}** **PUT /admin/rate-limits/{provider_name}**
- Body: `{ max_requests: 1-1000, time_window_seconds: 1-3600 }` - Body: `{ max_requests: 1-1000, time_window_seconds: 1-3600 }`
- Response: `RateLimitResponse` - Response: `RateLimitResponse`
- Hot-reloads the in-memory provider rate limiter. - Hot-reloads the in-memory provider rate limiter.
**GET /admin/users** **GET /admin/users**
- Response: `AdminUserResponse[]` - Response: `AdminUserResponse[]`
**PUT /admin/users/{id}/role** **PUT /admin/users/{id}/role**
- Body: `{ role: "user" | "admin" }` - Body: `{ role: "user" | "admin" }`
- Response: `{ message: string }` - Response: `{ message: string }`
**GET /health** **GET /health**
- Auth: Public - Auth: Public
- Response: `{ status: "ok" }` - Response: `{ status: "ok" }`
@ -619,6 +671,7 @@ All admin endpoints require `AdminUser` extractor (role = admin).
### Generation Lifecycle ### Generation Lifecycle
`POST /api/v1/syntheses/generate` creates a job in the `JobStore`, then spawns two nested tasks: `POST /api/v1/syntheses/generate` creates a job in the `JobStore`, then spawns two nested tasks:
- Inner task: wraps `run_generation` in a **15-minute `tokio::time::timeout`**. If the timeout fires, sends an `Error` progress event and releases the user lock. - Inner task: wraps `run_generation` in a **15-minute `tokio::time::timeout`**. If the timeout fires, sends an `Error` progress event and releases the user lock.
- Outer task: monitors the inner task's `JoinHandle` for panics. If the inner task panics, sends an `Error` progress event and releases the user lock. - Outer task: monitors the inner task's `JoinHandle` for panics. If the inner task panics, sends an `Error` progress event and releases the user lock.
@ -660,11 +713,13 @@ Processing in batches of `settings.batch_size` (minimum 1). For each batch:
**Batch assembly**: Pull up to `batch_size` candidates, skipping any where `source_counts[domain] >= max_articles_per_source` (traced as `filtered_diversity`). **Batch assembly**: Pull up to `batch_size` candidates, skipping any where `source_counts[domain] >= max_articles_per_source` (traced as `filtered_diversity`).
**Phase A — Scrape batch in parallel** (`JoinSet`): **Phase A — Scrape batch in parallel** (`JoinSet`):
- SSRF check (no private IPs), 15s timeout, 5MB body limit. - SSRF check (no private IPs), 15s timeout, 5MB body limit.
- HTML parsing for title (`<title>`, `og:title`), date (meta tags, JSON-LD, `<time>`), body (strip scripts/nav), soft-404 detection. - HTML parsing for title (`<title>`, `og:title`), date (meta tags, JSON-LD, `<time>`), body (strip scripts/nav), soft-404 detection.
- If article body is empty, is a soft-404, or is too old: trace as `filtered_empty` / `filtered_too_old` and skip. - If article body is empty, is a soft-404, or is too old: trace as `filtered_empty` / `filtered_too_old` and skip.
**Phase B — Classify/summarize batch in parallel** (`JoinSet`): **Phase B — Classify/summarize batch in parallel** (`JoinSet`):
- Check rate limit before classifying (waits up to 60s, then errors). - Check rate limit before classifying (waits up to 60s, then errors).
- Send article (title + body snippet based on `summary_length`: 500/2000/4000 chars) + categories + "Divers" to LLM. - Send article (title + body snippet based on `summary_length`: 500/2000/4000 chars) + categories + "Divers" to LLM.
- LLM returns `{title, summary, category, date, is_article}`. - LLM returns `{title, summary, category, date, is_article}`.
@ -746,7 +801,7 @@ All calls use structured JSON output (response_schema defines the expected shape
### Implementations ### Implementations
| Provider | Module | API Endpoint | Auth Method | | Provider | Module | API Endpoint | Auth Method |
|---|---|---|---| | --- | --- | --- | --- |
| Google Gemini | `llm/gemini.rs` | `generativelanguage.googleapis.com` | Query param `?key=` | | Google Gemini | `llm/gemini.rs` | `generativelanguage.googleapis.com` | Query param `?key=` |
| OpenAI | `llm/openai.rs` | `api.openai.com/v1/chat/completions` | Bearer token | | OpenAI | `llm/openai.rs` | `api.openai.com/v1/chat/completions` | Bearer token |
| Anthropic | `llm/anthropic.rs` | `api.anthropic.com/v1/messages` | `x-api-key` header | | Anthropic | `llm/anthropic.rs` | `api.anthropic.com/v1/messages` | `x-api-key` header |
@ -759,6 +814,7 @@ All calls use structured JSON output (response_schema defines the expected shape
### Response Schema ### Response Schema
`llm/schema.rs` builds JSON Schema definitions for: `llm/schema.rs` builds JSON Schema definitions for:
- Classification/summarization: `{title, summary, category, is_article}` - Classification/summarization: `{title, summary, category, is_article}`
- Web search: `{category_0: [{title, url, summary}], ...}` with per-category arrays - Web search: `{category_0: [{title, url, summary}], ...}` with per-category arrays
- Source link extraction: handled via heuristic HTML parsing (no LLM schema). - Source link extraction: handled via heuristic HTML parsing (no LLM schema).
@ -766,6 +822,7 @@ All calls use structured JSON output (response_schema defines the expected shape
### Error Mapping ### Error Mapping
`map_provider_http_error()` translates HTTP status codes to `AppError` variants: `map_provider_http_error()` translates HTTP status codes to `AppError` variants:
- 400 -> BadRequest - 400 -> BadRequest
- 401/403 -> BadRequest (invalid key) - 401/403 -> BadRequest (invalid key)
- 404 -> BadRequest (model not found) - 404 -> BadRequest (model not found)
@ -804,7 +861,7 @@ Runs every minute via `tokio::spawn` with a 60-second interval. For each tick:
### Environment Variables ### Environment Variables
| Variable | Required | Default | Description | | Variable | Required | Default | Description |
|---|---|---|---| | --- | --- | --- | --- |
| DATABASE_URL | Yes | - | PostgreSQL connection string | | DATABASE_URL | Yes | - | PostgreSQL connection string |
| MASTER_ENCRYPTION_KEY | Yes | - | 64 hex chars (32 bytes) for AES-256-GCM | | MASTER_ENCRYPTION_KEY | Yes | - | 64 hex chars (32 bytes) for AES-256-GCM |
| APP_URL | Yes | - | Public URL (CORS, magic links, cookies). No trailing slash. | | APP_URL | Yes | - | Public URL (CORS, magic links, cookies). No trailing slash. |
@ -820,6 +877,7 @@ Runs every minute via `tokio::spawn` with a 60-second interval. For each tick:
### Startup Validation ### Startup Validation
`AppConfig::validate()` checks at startup: `AppConfig::validate()` checks at startup:
- `MASTER_ENCRYPTION_KEY` is exactly 64 hex characters - `MASTER_ENCRYPTION_KEY` is exactly 64 hex characters
- `APP_URL` starts with http:// or https:// and has no trailing slash - `APP_URL` starts with http:// or https:// and has no trailing slash
@ -830,7 +888,7 @@ The application refuses to start with invalid configuration.
Default values applied when a user has no saved settings: Default values applied when a user has no saved settings:
| Setting | Default | Range | | Setting | Default | Range |
|---|---|---| | --- | --- | --- |
| max_articles_per_source | 3 | 1-10 | | max_articles_per_source | 3 | 1-10 |
| max_links_per_source | 8 | 1-30 | | max_links_per_source | 8 | 1-30 |
| use_brave_search | false | boolean | | use_brave_search | false | boolean |

Loading…
Cancel
Save