# AI Weekly Synth -- Requirements ## 1. Product Vision AI Weekly Synth is a self-hosted web application that generates AI-powered weekly news syntheses. Users define topics of interest (themes), add personalized sources, and let the application search the web, validate articles, and produce structured summaries organized by category. The application is designed for individuals or small teams who want an automated, curated news digest without relying on third-party newsletter services. ## 2. Target Users - **End users**: professionals or enthusiasts who follow one or more topics (e.g. AI, cybersecurity, finance) and want a weekly summary delivered by email or available on-demand. - **Administrators**: the instance operator who manages available LLM providers, rate limits, and user accounts. ## 3. Core Features ### 3.1 Multi-Theme Support - Users create multiple themes, each with its own search topic, categories, and content settings. - Each theme has its own set of personalized sources. - Syntheses are generated per theme and tagged accordingly. - Themes can be created, edited, and deleted independently. Deleting a theme preserves its existing syntheses. ### 3.2 Synthesis Generation - On-demand generation triggered by the user for a selected theme. - Two-phase pipeline: - **Phase 1 (Personalized Sources)**: extracts article links from user-configured sources, scrapes content, classifies and summarizes each article into the theme's categories. - **Phase 2 (Web Search Fallback)**: fills remaining category gaps using either Brave Search API or LLM-powered web search. - Real-time progress streaming via SSE so the user can monitor generation status. - Generation is capped at 15 minutes with automatic timeout. ### 3.3 Scheduled Generation - Users configure a per-theme schedule: selected days of the week, time (UTC), and up to 3 email recipients. - The application runs scheduled jobs automatically in the background, generating the synthesis and emailing it to all configured recipients. - No external cron required; the scheduler is an internal background task. ### 3.4 Personalized Sources - Users add web sources (blogs, news sites) per theme. - Sources can be imported in bulk via text input, CSV upload, or added individually. - Sources can be exported as CSV. - Sources can be marked as **preferred** (prioritized during generation -- processed before non-preferred sources). ### 3.5 Brave Search Integration - Optional alternative to LLM web search for Phase 2. - Users provide their own Brave Search API key. - When enabled, Phase 2 queries the Brave Search API instead of using LLM web grounding, then scrapes and classifies the results. ### 3.6 Export and Sharing - **Email**: send a synthesis to any email address (or to self) via Resend. - **PDF**: download a synthesis as a PDF file. - **Markdown**: download a synthesis as a Markdown file. ### 3.7 Settings #### Per-theme settings (content) - Theme name and search topic - Categories (user-defined list) - Max age of articles (days) - Max items per category - Summary detail level (short / medium / detailed) #### Global settings (pipeline and AI) - LLM provider and model selection (research model + web search model) - Search agent behavior (custom instructions for the AI research prompt) - Brave Search toggle and API key - Batch size (articles processed in parallel) - Source extraction window (number of sources per extraction wave) - Max articles per source (diversity cap) - Max links extracted per source - Rate limiting (max requests / time window) - Article history retention (days) - Settings import/export (JSON) ### 3.8 Authentication - Passwordless authentication via magic link emails. - Cloudflare Turnstile captcha on login and registration. - 30-day session cookies (HttpOnly, SameSite). ## 4. User Roles ### 4.1 User (default) - Register and log in via magic link. - Create and manage themes (CRUD). - Add and manage personalized sources per theme. - Configure generation settings and API keys. - Generate syntheses on demand or via schedule. - View, delete, and export syntheses. - View article history and LLM call logs per synthesis. ### 4.2 Admin All user capabilities, plus: - **Provider management**: add, edit, enable/disable, and remove LLM providers and their available models. Users select from admin-curated providers. - **Rate limit configuration**: set default rate limits per provider (max requests / time window). Users can override with their own values. - **User management**: view all users, promote users to admin or demote admins to user. The first admin is created via a CLI command (`create-admin`). ## 5. Non-Functional Requirements ### 5.1 Security - API keys (LLM, Brave Search) encrypted at rest with AES-256-GCM using a master encryption key. - SSRF prevention in the scraper (rejects private/loopback IPs). - CSRF protection via `X-Requested-With` header validation. - Session-based authentication with HttpOnly/SameSite cookies. ### 5.2 Performance - Configurable rate limiting for LLM API calls (per-user override or admin default). - Batched parallel scraping and classification to maximize throughput. - Windowed source extraction to avoid unnecessary work when the synthesis fills early. - Source diversity cap to prevent a single domain from dominating results. - Article history deduplication to avoid re-processing previously seen articles. - 15-minute generation timeout. ### 5.3 Self-Hosted - Single Docker Compose deployment (application + PostgreSQL). - No external dependencies beyond user-provided API keys and the Resend email service. - Single-tenant: one instance per deployment. - Users bring their own LLM API keys (no shared API key). ### 5.4 Internationalization - i18n-ready architecture (all UI strings externalized). - French is the only language currently supported. ### 5.5 Reliability - Hourly session cleanup background task. - Job store with TTL for expired generation jobs. - Scheduled generation with double-run prevention (`last_run_at` tracking). - Panic recovery and timeout handling for generation tasks.