5.6 KiB
AI Weekly Synth -- Requirements
1. Product Vision
AI Weekly Synth is a self-hosted web application that generates AI-powered weekly news syntheses. Users define topics of interest (themes), add personalized sources, and let the application search the web, validate articles, and produce structured summaries organized by category.
The application is designed for individuals or small teams who want an automated, curated news digest without relying on third-party newsletter services.
2. Target Users
- End users: professionals or enthusiasts who follow one or more topics (e.g. AI, cybersecurity, finance) and want a weekly summary delivered by email or available on-demand.
- Administrators: the instance operator who manages available LLM providers, rate limits, and user accounts.
3. Core Features
3.1 Multi-Theme Support
Users create multiple independent themes, each with its own search topic, categories, personalized sources, and content settings. Syntheses are generated and tagged per theme. Deleting a theme preserves its existing syntheses.
See functional_specs.md Section 2 for detailed behavior.
3.2 Synthesis Generation
- On-demand generation triggered by the user for a selected theme.
- Two-phase pipeline:
- Phase 1 (Personalized Sources): extracts article links from user-configured sources, scrapes content, classifies and summarizes each article into the theme's categories.
- Phase 2 (Web Search Fallback): fills remaining category gaps using either Brave Search API or LLM-powered web search.
- Real-time progress streaming via SSE so the user can monitor generation status.
- Generation is capped at 15 minutes with automatic timeout.
3.3 Scheduled Generation
- Users configure a per-theme schedule: selected days of the week, time (UTC), and up to 3 email recipients.
- The application runs scheduled jobs automatically in the background, generating the synthesis and emailing it to all configured recipients.
- No external cron required; the scheduler is an internal background task.
3.4 Personalized Sources
- Users add web sources (blogs, news sites) per theme.
- Sources can be imported in bulk via text input, CSV upload, or added individually, always bound to the selected theme.
- Sources can be exported as CSV, always scoped to the selected theme.
- Sources can be marked as preferred (prioritized during generation -- processed before non-preferred sources), with preference state scoped per theme.
3.5 Brave Search Integration
Optional alternative to LLM web search for Phase 2. Users provide their own Brave Search API key; when enabled, Phase 2 queries Brave instead of using LLM web grounding. See functional_specs.md Section 2.5.
3.6 Export and Sharing
Syntheses can be exported as email (via Resend), PDF, or Markdown. See functional_specs.md Section 6.
3.7 Settings
Settings are split into two levels: per-theme content settings (search topic, categories, max age, max items, summary length) and global pipeline settings (LLM provider/model, Brave Search, batch size, rate limits, article history retention, import/export). See functional_specs.md Section 4 for the complete settings reference.
3.8 Authentication
Passwordless authentication via magic link emails with Cloudflare Turnstile captcha. Sessions use 30-day HttpOnly/SameSite cookies. See architecture.md Section 6 for the full security model.
4. User Roles
4.1 User (default)
- Register and log in via magic link.
- Create and manage themes (CRUD).
- Add and manage personalized sources per theme.
- Configure generation settings and API keys.
- Generate syntheses on demand or via schedule.
- View, delete, and export syntheses.
- View article history and LLM call logs per synthesis.
4.2 Admin
All user capabilities, plus provider management (add/edit/enable/disable LLM providers and models), rate limit configuration (defaults per provider), and user management (view all users, promote/demote roles). The first admin is created via the create-admin CLI command. See functional_specs.md Section 5.
5. Non-Functional Requirements
5.1 Security
- API keys (LLM, Brave Search) encrypted at rest with AES-256-GCM using a master encryption key.
- SSRF prevention in the scraper (rejects private/loopback IPs).
- CSRF protection via
X-Requested-Withheader validation. - Session-based authentication with HttpOnly/SameSite cookies.
5.2 Performance
- Configurable rate limiting for LLM API calls (per-user override or admin default).
- Batched parallel scraping and classification to maximize throughput.
- Windowed source extraction to avoid unnecessary work when the synthesis fills early.
- Source diversity cap to prevent a single domain from dominating results.
- Article history deduplication to avoid re-processing previously seen articles.
- 15-minute generation timeout.
5.3 Self-Hosted
- Single Docker Compose deployment (application + PostgreSQL).
- No external dependencies beyond user-provided API keys and the Resend email service.
- Single-tenant: one instance per deployment.
- Users bring their own LLM API keys (no shared API key).
5.4 Internationalization
- i18n-ready architecture (all UI strings externalized).
- French is the only language currently supported.
5.5 Reliability
- Hourly session cleanup background task.
- Job store with TTL for expired generation jobs.
- Scheduled generation with double-run prevention (
last_run_attracking). - Panic recovery and timeout handling for generation tasks.
- Release gating in CI requires deterministic coverage for critical autonomous flows (notably scheduler execution and SSE progress behavior).