You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

27 KiB

AI Weekly Synth -- Technical Specifications

1. Backend Tech Stack

Dependency Version Purpose
axum 0.8 Web framework (macros, multipart)
tokio 1 Async runtime (full features)
tower 0.5 Middleware composition
tower-http 0.6 CORS, static files, tracing, headers
sqlx 0.8 Async Postgres driver (runtime-tokio, tls-rustls, uuid, chrono, json, migrate)
reqwest 0.12 HTTP client (JSON)
serde / serde_json 1 Serialization/deserialization
chrono 0.4 Date/time handling (serde feature)
aes-gcm 0.10 AES-256-GCM encryption
zeroize 1 Secure memory zeroing
sha2 0.10 SHA-256 hashing
rand 0.8 Random number generation
base64 0.22 Base64 encoding
hex 0.4 Hex encoding/decoding
async-trait 0.1 Async trait objects
tracing / tracing-subscriber 0.1 / 0.3 Structured logging (env-filter, json)
dotenvy 0.15 .env file loading
clap 4 CLI argument parsing
scraper 0.22 HTML parsing (CSS selectors)
ego-tree 0.10 Tree data structure (used by scraper)
url 2 URL parsing and validation
email_address 0.2 Email validation
anyhow 1 Error context
thiserror 2 Error type derivation
uuid 1 UUID v4 generation (serde feature)
dashmap 6 Concurrent hash maps
tokio-stream 0.1 Stream utilities for SSE
futures 0.3 Async stream combinators
printpdf 0.7 PDF generation

Dev dependencies: tower (util), http-body-util, wiremock 0.6.

Rust edition: 2021.


2. Frontend Tech Stack

Dependency Version Purpose
solid-js ^1.9.0 Reactive UI framework
@solidjs/router ^0.15.0 Client-side routing
lucide-solid ^0.475.0 Icon library
date-fns ^4.1.0 Date formatting
tailwindcss ^4.1.0 Utility-first CSS (v4)
@tailwindcss/vite ^4.1.0 Tailwind Vite plugin
vite ^6.2.0 Build tool and dev server
vite-plugin-solid ^2.11.0 SolidJS Vite integration
typescript ~5.8.0 Type checking
vitest ^3.0.0 Unit testing
@solidjs/testing-library ^0.8.0 Component testing
jsdom ^25.0.0 DOM environment for tests

Frontend Routes

Path Component Auth Description
/login Login Public Login page
/register Register Public Registration page
/auth/verify AuthVerify Public Magic link verification
/ Home Protected Dashboard / synthesis list
/settings Settings Protected User settings
/themes ThemeManager Protected Theme CRUD + source management
/generate GenerateSynthesis Protected Generation trigger + progress
/synthesis/:id SynthesisDetail Protected Full synthesis view
/article-history ArticleHistory Protected Article history browser
/llm-logs/:jobId LlmLogs Protected LLM call log viewer
/admin/providers AdminProviders Admin Provider configuration
/admin/rate-limits AdminRateLimits Admin Rate limit configuration
/admin/users AdminUsers Admin User management

3. Database Schema

3.1 users

Column Type Constraints
id UUID PK, DEFAULT gen_random_uuid()
email TEXT NOT NULL, UNIQUE
display_name TEXT nullable
role TEXT NOT NULL, DEFAULT 'user', CHECK (user/admin)
created_at TIMESTAMPTZ NOT NULL, DEFAULT now()
updated_at TIMESTAMPTZ NOT NULL, DEFAULT now()

Indexes: idx_users_email on (email).

3.2 sessions

Column Type Constraints
session_hash TEXT PK (SHA-256 of raw token)
user_id UUID NOT NULL, FK users(id) CASCADE
created_at TIMESTAMPTZ NOT NULL, DEFAULT now()
expires_at TIMESTAMPTZ NOT NULL
last_active_at TIMESTAMPTZ NOT NULL, DEFAULT now()
ip_address TEXT nullable
user_agent TEXT nullable

Indexes: idx_sessions_user_id, idx_sessions_expires_at.

3.3 magic_tokens

Column Type Constraints
id UUID PK, DEFAULT gen_random_uuid()
email TEXT NOT NULL
token_hash TEXT NOT NULL, UNIQUE
created_at TIMESTAMPTZ NOT NULL, DEFAULT now()
expires_at TIMESTAMPTZ NOT NULL
used BOOLEAN NOT NULL, DEFAULT false

Indexes: idx_magic_tokens_email, idx_magic_tokens_expires.

3.4 settings

Per-user pipeline configuration. One row per user (user_id is the PK).

Column Type Constraints
user_id UUID PK, FK users(id) CASCADE
max_articles_per_source INTEGER NOT NULL, DEFAULT 3
max_links_per_source INTEGER NOT NULL, DEFAULT 8
use_brave_search BOOLEAN NOT NULL, DEFAULT false
article_history_days INTEGER NOT NULL, DEFAULT 90
batch_size INTEGER NOT NULL, DEFAULT 5
source_extraction_window INTEGER NOT NULL, DEFAULT 3
search_agent_behavior TEXT NOT NULL, DEFAULT ''
ai_provider TEXT NOT NULL, DEFAULT ''
ai_model TEXT NOT NULL, DEFAULT ''
ai_model_websearch TEXT NOT NULL, DEFAULT ''
rate_limit_max_requests INTEGER nullable
rate_limit_time_window_seconds INTEGER nullable
updated_at TIMESTAMPTZ NOT NULL, DEFAULT now()

3.5 themes

Per-user topic configurations with content settings.

Column Type Constraints
id UUID PK, DEFAULT gen_random_uuid()
user_id UUID NOT NULL, FK users(id) CASCADE
name TEXT NOT NULL
theme TEXT NOT NULL (search topic)
categories JSONB NOT NULL, DEFAULT '[]'
max_items_per_category INTEGER NOT NULL, DEFAULT 4
max_age_days INTEGER NOT NULL, DEFAULT 7
summary_length INTEGER NOT NULL, DEFAULT 3
created_at TIMESTAMPTZ NOT NULL, DEFAULT now()
updated_at TIMESTAMPTZ NOT NULL, DEFAULT now()

Indexes: idx_themes_user_id.

3.6 sources

User-curated news source URLs, optionally tied to a theme.

Column Type Constraints
id UUID PK, DEFAULT gen_random_uuid()
user_id UUID NOT NULL, FK users(id) CASCADE
title VARCHAR(200) NOT NULL, CHECK length 1-200
url VARCHAR(1000) NOT NULL, CHECK length <= 1000
theme_id UUID nullable, FK themes(id) CASCADE
is_preferred BOOLEAN NOT NULL, DEFAULT false
created_at TIMESTAMPTZ NOT NULL, DEFAULT now()

Indexes: idx_sources_user_id, UNIQUE idx_sources_user_id_url on (user_id, url).

3.7 syntheses

Generated synthesis results with JSONB section data.

Column Type Constraints
id UUID PK, DEFAULT gen_random_uuid()
user_id UUID NOT NULL, FK users(id) CASCADE
week VARCHAR(10) NOT NULL (ISO week string)
sections JSONB NOT NULL, DEFAULT '[]'
status VARCHAR(20) NOT NULL, DEFAULT 'completed'
job_id UUID nullable
theme_id UUID nullable, FK themes(id) SET NULL
created_at TIMESTAMPTZ NOT NULL, DEFAULT now()

Indexes: idx_syntheses_user_id_created_at on (user_id, created_at DESC).

JSONB structure for sections:

[
  {
    "title": "Category Name",
    "items": [
      { "title": "Article Title", "url": "https://...", "summary": "...", "date": "2026-03-25" }
    ]
  }
]

3.8 theme_schedules

Automated generation schedules, one per theme.

Column Type Constraints
id UUID PK, DEFAULT gen_random_uuid()
theme_id UUID NOT NULL, UNIQUE, FK themes(id) CASCADE
user_id UUID NOT NULL, FK users(id) CASCADE
enabled BOOLEAN NOT NULL, DEFAULT true
days JSONB NOT NULL, DEFAULT '[]' (e.g. ["mon","fri"])
time_utc TEXT NOT NULL, DEFAULT '08:00' (HH:MM)
emails JSONB NOT NULL, DEFAULT '[]' (up to 3 addresses)
last_run_at TIMESTAMPTZ nullable
created_at TIMESTAMPTZ NOT NULL, DEFAULT now()
updated_at TIMESTAMPTZ NOT NULL, DEFAULT now()

Indexes: idx_theme_schedules_enabled (partial, WHERE enabled = true).

3.9 article_history

Article URL deduplication and full provenance tracing.

Column Type Constraints
id UUID PK, DEFAULT gen_random_uuid()
user_id UUID NOT NULL, FK users(id) CASCADE
url_hash TEXT NOT NULL (SHA-256 of normalized URL)
url TEXT NOT NULL
title TEXT NOT NULL, DEFAULT ''
source_type TEXT NOT NULL, DEFAULT 'unknown'
source_url TEXT nullable
category TEXT nullable
synthesis_id UUID nullable, FK syntheses(id) SET NULL
status TEXT NOT NULL, DEFAULT 'used'
scraped_ok BOOLEAN NOT NULL, DEFAULT true
job_id UUID NOT NULL
published_date TEXT nullable
created_at TIMESTAMPTZ NOT NULL, DEFAULT now()

Indexes: idx_article_history_user_url on (user_id, url_hash), idx_article_history_job_id.

Status values: used, filtered_history, filtered_diversity, filtered_not_article, filtered_too_old, filtered_empty, filtered_homepage, filtered_cross_phase_dedup.

Source type values: personalized_source, brave_search, web_search.

3.10 llm_call_log

Full LLM interaction logging for debugging and analysis.

Column Type Constraints
id UUID PK, DEFAULT gen_random_uuid()
user_id UUID NOT NULL, FK users(id) CASCADE
job_id UUID NOT NULL
call_type TEXT NOT NULL
model TEXT NOT NULL
system_prompt TEXT NOT NULL, DEFAULT ''
user_prompt TEXT NOT NULL, DEFAULT ''
response_body TEXT NOT NULL, DEFAULT ''
duration_ms INTEGER NOT NULL, DEFAULT 0
article_url TEXT nullable
created_at TIMESTAMPTZ NOT NULL, DEFAULT now()

Indexes: idx_llm_call_log_job_id, idx_llm_call_log_user_id on (user_id, created_at).

3.11 admin_providers

Admin-curated catalog of LLM providers and their models.

Column Type Constraints
id UUID PK, DEFAULT gen_random_uuid()
provider_name VARCHAR(50) NOT NULL, UNIQUE
display_name VARCHAR(100) NOT NULL
models_scraping JSONB NOT NULL, DEFAULT '[]'
models_websearch JSONB NOT NULL, DEFAULT '[]'
is_enabled BOOLEAN NOT NULL, DEFAULT true
created_at TIMESTAMPTZ NOT NULL, DEFAULT now()
updated_at TIMESTAMPTZ NOT NULL, DEFAULT now()

Indexes: idx_admin_providers_enabled (partial, WHERE is_enabled = true).

Seeded with: gemini, openai, anthropic.

JSONB model structure:

[{"model_id": "gemini-2.5-pro", "display_name": "Gemini 2.5 Pro", "is_default": true}]

3.12 admin_rate_limits

Per-provider rate limit configuration.

Column Type Constraints
id UUID PK, DEFAULT gen_random_uuid()
provider_name VARCHAR(50) NOT NULL, UNIQUE, FK admin_providers(provider_name) CASCADE
max_requests INTEGER NOT NULL, DEFAULT 30
time_window_seconds INTEGER NOT NULL, DEFAULT 60
updated_at TIMESTAMPTZ NOT NULL, DEFAULT now()

Seeded defaults: gemini 29/60s, openai 50/60s, anthropic 40/60s.

3.13 user_api_keys

Encrypted user LLM API keys.

Column Type Constraints
id UUID PK, DEFAULT gen_random_uuid()
user_id UUID NOT NULL, FK users(id) CASCADE
provider_name VARCHAR(50) NOT NULL
encrypted_key BYTEA NOT NULL
nonce BYTEA NOT NULL
key_prefix VARCHAR(20) NOT NULL
created_at TIMESTAMPTZ NOT NULL, DEFAULT now()
updated_at TIMESTAMPTZ NOT NULL, DEFAULT now()

Constraint: UNIQUE(user_id, provider_name). Valid providers: gemini, openai, anthropic, brave_search.

3.14 audit_log

Admin mutation audit trail.

Column Type Constraints
id UUID PK, DEFAULT gen_random_uuid()
admin_user_id UUID nullable, FK users(id) SET NULL
action VARCHAR(100) NOT NULL
target_type VARCHAR(50) nullable
target_id VARCHAR(255) nullable
details JSONB nullable
created_at TIMESTAMPTZ NOT NULL, DEFAULT now()

Indexes: idx_audit_log_created_at (DESC), idx_audit_log_admin_user.


4. API Endpoints

All endpoints are prefixed with /api/v1. Responses are JSON. Errors follow the shape { "error": "message" }.

4.1 Authentication

POST /auth/register

  • Auth: Public
  • Body: { email: string, display_name?: string, turnstile_token: string }
  • Response: { message: string }
  • Sends magic link email. Rate limited.

POST /auth/login

  • Auth: Public
  • Body: { email: string, turnstile_token: string }
  • Response: { message: string }
  • Sends magic link email. Rate limited.

GET /auth/verify?token=...&email=...

  • Auth: Public
  • Response: Redirect to frontend with session cookie set.

POST /auth/verify

  • Auth: Public
  • Body: { token: string, email: string }
  • Response: { message: string, user: User }
  • Sets session HttpOnly cookie (30-day expiry).

POST /auth/logout

  • Auth: Authenticated
  • Response: { message: string }
  • Clears session cookie and deletes DB session.

GET /auth/me

  • Auth: Authenticated
  • Response: { id, email, display_name, role, created_at }

4.2 Settings

GET /settings

  • Auth: Authenticated
  • Response: UserSettings (creates defaults if not exists)

PUT /settings

  • Auth: Authenticated
  • Body: UpdateSettingsRequest (all fields required)
  • Validation: max_articles_per_source 1-10, max_links_per_source 1-30, batch_size 1-20, source_extraction_window 1-10, article_history_days 0-365, search_agent_behavior max 2000 chars, ai_provider/ai_model/ai_model_websearch max 100 chars.
  • Response: Updated UserSettings

4.3 Themes

GET /themes

  • Auth: Authenticated
  • Response: ThemeResponse[]

POST /themes

  • Auth: Authenticated
  • Body: { name, theme, categories: string[], max_items_per_category?, max_age_days?, summary_length? }
  • Validation: name non-empty max 200 chars, categories 1-20 non-empty entries, max_items 1-50, max_age 1-365, summary_length 1-3.
  • Response: ThemeResponse

PUT /themes/{id}

  • Auth: Authenticated (owner only)
  • Body: UpdateThemeRequest (all fields optional)
  • Response: ThemeResponse

DELETE /themes/{id}

  • Auth: Authenticated (owner only)
  • Response: 204 No Content

4.4 Schedules

GET /themes/{id}/schedule

  • Auth: Authenticated (theme owner)
  • Response: ScheduleResponse or 404

PUT /themes/{id}/schedule

  • Auth: Authenticated (theme owner)
  • Body: { enabled, days: string[], time_utc: "HH:MM", emails: string[] }
  • Validation: days from mon-sun, time HH:MM format, max 3 emails.
  • Response: ScheduleResponse

DELETE /themes/{id}/schedule

  • Auth: Authenticated (theme owner)
  • Response: 204 No Content

4.5 Sources

GET /sources?theme_id=...

  • Auth: Authenticated
  • Response: SourceResponse[]

POST /sources

  • Auth: Authenticated
  • Body: { title, url, theme_id? }
  • Validation: title non-empty max 200, URL http(s) max 1000 chars.
  • Response: SourceResponse

PUT /sources/preferred

  • Auth: Authenticated
  • Body: { source_ids: UUID[] }
  • Response: { updated: number }

DELETE /sources/{id}

  • Auth: Authenticated (owner only)
  • Response: 204 No Content

POST /sources/bulk

  • Auth: Authenticated
  • Body: { sources: CreateSourceRequest[], theme_id? }
  • Response: { imported, skipped, errors }

POST /sources/import-csv

  • Auth: Authenticated
  • Body: Multipart file upload (CSV: title,url)
  • Response: { imported, skipped, errors }

GET /sources/export-csv

  • Auth: Authenticated
  • Response: CSV file download

4.6 Generation

POST /syntheses/generate

  • Auth: Authenticated
  • Body: { theme_id: UUID }
  • Response: { job_id: UUID }
  • Creates job in JobStore, spawns background generation task. Returns 409 if user already has active job.

GET /syntheses/generate/{job_id}/progress

  • Auth: Authenticated (job owner)
  • Response: SSE stream of ProgressEvent
  • Events: progress (step, message, percent), complete (synthesis_id), error (message).

POST /syntheses/generate/{job_id}/stop

  • Auth: Authenticated (job owner)
  • Response: { message: string }
  • Sets cooperative cancellation flag.

4.7 Syntheses

GET /syntheses

  • Auth: Authenticated
  • Response: SynthesisListItem[] (with section summaries, theme info)

GET /syntheses/{id}

  • Auth: Authenticated (owner only)
  • Response: SynthesisResponse (full sections data)

DELETE /syntheses/{id}

  • Auth: Authenticated (owner only)
  • Response: 204 No Content

POST /syntheses/{id}/send-email

  • Auth: Authenticated
  • Body: { email: string }
  • Response: { message: string }

GET /syntheses/{id}/export/markdown

  • Auth: Authenticated
  • Response: Markdown file download

GET /syntheses/{id}/export/pdf

  • Auth: Authenticated
  • Response: PDF file download

4.8 Article History & Provenance

GET /article-history?limit=&offset=&job_id=&status=

  • Auth: Authenticated
  • Response: { items: ArticleHistoryEntry[], total: number }

DELETE /article-history

  • Auth: Authenticated
  • Response: { deleted: number }

GET /syntheses/{id}/provenance

  • Auth: Authenticated
  • Response: ArticleHistoryEntry[] (articles with status "used" for this synthesis's job_id)

4.9 LLM Call Logs

GET /llm-logs/{job_id}

  • Auth: Authenticated
  • Response: LlmCallLogEntry[]

4.10 User API Keys

GET /user/api-keys

  • Auth: Authenticated
  • Response: ApiKeyResponse[] (id, provider_name, key_prefix, timestamps; never the full key)

POST /user/api-keys

  • Auth: Authenticated
  • Body: { provider_name, api_key }
  • Validation: provider in (gemini, openai, anthropic, brave_search), key 8-500 chars.
  • Response: ApiKeyResponse
  • Encrypts key with AES-256-GCM before storage; upserts (one key per user per provider).

DELETE /user/api-keys/{provider}

  • Auth: Authenticated
  • Response: 204 No Content

POST /user/api-keys/{provider}/test

  • Auth: Authenticated
  • Response: { success: boolean, message: string }
  • Decrypts key, calls provider test endpoint.

POST /user/api-keys/export

  • Auth: Authenticated
  • Response: { keys: [{ provider_name, api_key }] }
  • Decrypts and returns all keys (used for backup/migration).

4.11 Public Configuration

GET /config/providers

  • Auth: Authenticated
  • Response: ProviderConfigResponse[] (enabled providers with model lists for scraping and websearch)

4.12 Admin Endpoints

All admin endpoints require AdminUser extractor (role = admin).

GET /admin/providers

  • Response: AdminProviderResponse[]

POST /admin/providers

  • Body: CreateProviderRequest
  • Validation: provider_name in (gemini, openai, anthropic), at least one model per list, at most one default per list.
  • Response: AdminProviderResponse

PUT /admin/providers/{id}

  • Body: UpdateProviderRequest (all fields optional)
  • Response: AdminProviderResponse

DELETE /admin/providers/{id}

  • Response: 204 No Content

GET /admin/rate-limits

  • Response: RateLimitResponse[]

PUT /admin/rate-limits/{provider_name}

  • Body: { max_requests: 1-1000, time_window_seconds: 1-3600 }
  • Response: RateLimitResponse
  • Hot-reloads the in-memory provider rate limiter.

GET /admin/users

  • Response: AdminUserResponse[]

PUT /admin/users/{id}/role

  • Body: { role: "user" | "admin" }
  • Response: { message: string }

GET /health

  • Auth: Public
  • Response: { status: "ok" }

5. Generation Pipeline Technical Flow

Overview

The pipeline runs as a background tokio task spawned by POST /syntheses/generate. It has a 15-minute global timeout and supports cooperative cancellation via AtomicBool.

Initialization

  1. Load UserSettings from DB (or create defaults)
  2. Cleanup old article history (entries older than article_history_days with dropped status) and truncate old LLM call logs
  3. Load the target Theme (categories, max_items, max_age_days, summary_length)
  4. Load user Sources for the theme
  5. Decrypt user's LLM API key, create Arc<dyn LlmProvider> via factory
  6. Resolve models: ai_model (for scraping/classification) and ai_model_websearch (for web search); user override or admin default fallback
  7. Initialize per-user rate limiter (from settings or admin defaults)
  8. Initialize tracking structures: article_scraped (category -> Vec), source_counts, url_source, filled_counts, seen_urls, pending_traces

Phase 1: Personalized Sources

Skipped if user has 0 sources for the theme.

1a. Windowed source extraction

  • Query article_history for the last source used; reorder sources in a rolling window starting after that source
  • Select up to source_extraction_window sources per generation
  • For each source (bounded concurrency of 5): fetch page HTML, extract up to max_links_per_source article URLs via HTML parsing (same-domain, non-homepage, no static assets)
  • Deduplicate URLs cross-source via seen_urls
  • Batch-check article_history for already-seen URL hashes; filter matches (traced as filtered_history)
  • Shuffle remaining candidates to interleave sources
  • Track url -> source in url_source

1b. Batch scrape + classify

Processing in batches of settings.batch_size:

  • Batch assembly: Pull up to batch_size candidates, skip if source_counts[domain] >= max_articles_per_source (traced as filtered_diversity)
  • Scrape (JoinSet, parallel): SSRF check, 15s timeout, 5MB limit, HTML parsing, title/date/body extraction, soft-404 detection. Skip empty/too-old articles.
  • Classify (JoinSet, parallel): Rate limit check (60s wait), send title + first 500 chars to LLM with categories list. LLM returns {title, summary, category}. Validate category via assign_category() (fallback to "Autre", drop if full).
  • LLM call logging: Every LLM call is logged with full prompt, response, timing, and article URL.
  • Early exit: Stop when total articles >= (num_categories + 1) * max_items_per_category.
  • Batch-flush pending traces to article_history.

Phase 2: Web Search Fallback

Skipped if all categories are filled to max_items_per_category.

2a. Compute gaps: For each category, needed = max_items - filled.

2b. Path selection based on settings.use_brave_search:

Path A -- Brave Search (use_brave_search = true):

  • Decrypt user's Brave Search API key
  • Query: "{theme} actualites", up to 20 results, freshness mapped from max_age_days (pd/pw/pm/py)
  • Filter results through filter_phase2_url(): homepage filter, cross-phase dedup, article history check, source diversity check
  • Batch scrape + classify (same logic as Phase 1b, source_type = "brave_search")

Path B -- LLM Web Search (use_brave_search = false):

  • Build search prompt with theme, categories, and gap counts
  • Call LLM with ai_model_websearch model; returns structured JSON: {category_0: [{title, url, summary}], ...}
  • Filter URLs through filter_phase2_url()
  • Scrape each result sequentially to validate; keep LLM-provided title/summary (no re-classification)
  • source_type = "web_search"

Save & Record

  1. Error if all article lists are empty
  2. Order sections: user-defined categories first (in order), then "Autre" if non-empty
  3. Sanitize: strip \u0000 null bytes from JSON (PostgreSQL JSONB requirement)
  4. Insert synthesis row: job_id, week (ISO week string), sections (JSONB), status "completed", theme_id
  5. Record used articles: batch-insert article_history entries with status "used", synthesis_id, and correct source_type

6. LLM Provider Abstraction

Trait Definition

#[async_trait]
pub trait LlmProvider: Send + Sync {
    fn provider_id(&self) -> &str;
    async fn call_llm(&self, model: &str, system_prompt: &str,
                       user_prompt: &str, response_schema: &Value)
        -> Result<Value, AppError>;
}

All calls use structured JSON output (response_schema defines the expected shape).

Implementations

Provider Module API Endpoint Auth Method
Google Gemini llm/gemini.rs generativelanguage.googleapis.com Query param ?key=
OpenAI llm/openai.rs api.openai.com/v1/chat/completions Bearer token
Anthropic llm/anthropic.rs api.anthropic.com/v1/messages x-api-key header
Mock llm/mock.rs N/A (in-memory) N/A

Factory

llm/factory.rs provides create_provider(provider_name, api_key, http_client) -> Arc<dyn LlmProvider>. Matches on provider name string.

Response Schema

llm/schema.rs builds JSON Schema definitions for:

  • Classification/summarization: {title, summary, category, is_article}
  • Web search: {category_0: [{title, url, summary}], ...} with per-category arrays
  • Source link extraction: {links: [{url}]}

Error Mapping

map_provider_http_error() translates HTTP status codes to AppError variants:

  • 400 -> BadRequest
  • 401/403 -> BadRequest (invalid key)
  • 404 -> BadRequest (model not found)
  • 429/529 -> RateLimited
  • Other -> Internal

7. Background Tasks

Session Cleanup

Runs hourly via tokio::spawn. Calls db::sessions::delete_expired to remove sessions past their expires_at timestamp.

Job Store Cleanup

JobStore::cleanup_expired removes job entries older than 1 hour (the TTL constant). Called periodically. Releases user locks for expired jobs.

Scheduler

Runs every minute via tokio::spawn with a 60-second interval. For each tick:

  1. current_day_code() -> "mon" through "sun"
  2. find_due_schedules(pool, day, time) -> queries enabled schedules matching current day and time (HH:MM)
  3. For each due schedule:
    • Skip if job_store.has_active_job(user_id) returns Some (manual generation in progress)
    • Create a temporary watch::channel and AtomicBool
    • Call synthesis::run_generation_inner directly (bypasses job store)
    • On success: send emails to configured recipients (up to 3), mark schedule as run
    • On failure: log error, do not mark as run

8. Configuration

Environment Variables

Variable Required Default Description
DATABASE_URL Yes - PostgreSQL connection string
MASTER_ENCRYPTION_KEY Yes - 64 hex chars (32 bytes) for AES-256-GCM
APP_URL Yes - Public URL (CORS, magic links, cookies). No trailing slash.
PORT No 8080 HTTP server port
RUST_LOG No - Logging filter (e.g., "info,ai_synth_backend=debug")
STATIC_DIR No ../frontend/dist Path to built SolidJS files
RESEND_API_KEY Yes - Resend email service API key
EMAIL_FROM Yes - Sender address for emails
TURNSTILE_SECRET_KEY Yes - Cloudflare Turnstile server secret
TURNSTILE_SITE_KEY Yes - Cloudflare Turnstile client key
POSTGRES_PASSWORD Yes - Used by docker-compose for DB container

Startup Validation

AppConfig::validate() checks at startup:

  • MASTER_ENCRYPTION_KEY is exactly 64 hex characters
  • APP_URL starts with http:// or https:// and has no trailing slash

The application refuses to start with invalid configuration.

User Settings Model

Default values applied when a user has no saved settings:

Setting Default Range
max_articles_per_source 3 1-10
max_links_per_source 8 1-30
use_brave_search false boolean
article_history_days 90 0-365
batch_size 5 1-20
source_extraction_window 3 1-10
search_agent_behavior "" max 2000 chars
ai_provider "" max 100 chars
ai_model "" max 100 chars
ai_model_websearch "" max 100 chars
rate_limit_max_requests null >= 1 if set
rate_limit_time_window_seconds null >= 1 if set