You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
142 lines
6.0 KiB
Markdown
142 lines
6.0 KiB
Markdown
# AI Weekly Synth -- Requirements
|
|
|
|
## 1. Product Vision
|
|
|
|
AI Weekly Synth is a self-hosted web application that generates AI-powered weekly news syntheses. Users define topics of interest (themes), add personalized sources, and let the application search the web, validate articles, and produce structured summaries organized by category.
|
|
|
|
The application is designed for individuals or small teams who want an automated, curated news digest without relying on third-party newsletter services.
|
|
|
|
## 2. Target Users
|
|
|
|
- **End users**: professionals or enthusiasts who follow one or more topics (e.g. AI, cybersecurity, finance) and want a weekly summary delivered by email or available on-demand.
|
|
- **Administrators**: the instance operator who manages available LLM providers, rate limits, and user accounts.
|
|
|
|
## 3. Core Features
|
|
|
|
### 3.1 Multi-Theme Support
|
|
|
|
- Users create multiple themes, each with its own search topic, categories, and content settings.
|
|
- Each theme has its own set of personalized sources.
|
|
- Syntheses are generated per theme and tagged accordingly.
|
|
- Themes can be created, edited, and deleted independently. Deleting a theme preserves its existing syntheses.
|
|
|
|
### 3.2 Synthesis Generation
|
|
|
|
- On-demand generation triggered by the user for a selected theme.
|
|
- Two-phase pipeline:
|
|
- **Phase 1 (Personalized Sources)**: extracts article links from user-configured sources, scrapes content, classifies and summarizes each article into the theme's categories.
|
|
- **Phase 2 (Web Search Fallback)**: fills remaining category gaps using either Brave Search API or LLM-powered web search.
|
|
- Real-time progress streaming via SSE so the user can monitor generation status.
|
|
- Generation is capped at 15 minutes with automatic timeout.
|
|
|
|
### 3.3 Scheduled Generation
|
|
|
|
- Users configure a per-theme schedule: selected days of the week, time (UTC), and up to 3 email recipients.
|
|
- The application runs scheduled jobs automatically in the background, generating the synthesis and emailing it to all configured recipients.
|
|
- No external cron required; the scheduler is an internal background task.
|
|
|
|
### 3.4 Personalized Sources
|
|
|
|
- Users add web sources (blogs, news sites) per theme.
|
|
- Sources can be imported in bulk via text input, CSV upload, or added individually.
|
|
- Sources can be exported as CSV.
|
|
- Sources can be marked as **preferred** (prioritized during generation -- processed before non-preferred sources).
|
|
|
|
### 3.5 Brave Search Integration
|
|
|
|
- Optional alternative to LLM web search for Phase 2.
|
|
- Users provide their own Brave Search API key.
|
|
- When enabled, Phase 2 queries the Brave Search API instead of using LLM web grounding, then scrapes and classifies the results.
|
|
|
|
### 3.6 Export and Sharing
|
|
|
|
- **Email**: send a synthesis to any email address (or to self) via Resend.
|
|
- **PDF**: download a synthesis as a PDF file.
|
|
- **Markdown**: download a synthesis as a Markdown file.
|
|
|
|
### 3.7 Settings
|
|
|
|
#### Per-theme settings (content)
|
|
- Theme name and search topic
|
|
- Categories (user-defined list)
|
|
- Max age of articles (days)
|
|
- Max items per category
|
|
- Summary detail level (short / medium / detailed)
|
|
|
|
#### Global settings (pipeline and AI)
|
|
- LLM provider and model selection (research model + web search model)
|
|
- Search agent behavior (custom instructions for the AI research prompt)
|
|
- Brave Search toggle and API key
|
|
- Batch size (articles processed in parallel)
|
|
- Source extraction window (number of sources per extraction wave)
|
|
- Max articles per source (diversity cap)
|
|
- Max links extracted per source
|
|
- Rate limiting (max requests / time window)
|
|
- Article history retention (days)
|
|
- Settings import/export (JSON)
|
|
|
|
### 3.8 Authentication
|
|
|
|
- Passwordless authentication via magic link emails.
|
|
- Cloudflare Turnstile captcha on login and registration.
|
|
- 30-day session cookies (HttpOnly, SameSite).
|
|
|
|
## 4. User Roles
|
|
|
|
### 4.1 User (default)
|
|
|
|
- Register and log in via magic link.
|
|
- Create and manage themes (CRUD).
|
|
- Add and manage personalized sources per theme.
|
|
- Configure generation settings and API keys.
|
|
- Generate syntheses on demand or via schedule.
|
|
- View, delete, and export syntheses.
|
|
- View article history and LLM call logs per synthesis.
|
|
|
|
### 4.2 Admin
|
|
|
|
All user capabilities, plus:
|
|
|
|
- **Provider management**: add, edit, enable/disable, and remove LLM providers and their available models. Users select from admin-curated providers.
|
|
- **Rate limit configuration**: set default rate limits per provider (max requests / time window). Users can override with their own values.
|
|
- **User management**: view all users, promote users to admin or demote admins to user.
|
|
|
|
The first admin is created via a CLI command (`create-admin`).
|
|
|
|
## 5. Non-Functional Requirements
|
|
|
|
### 5.1 Security
|
|
|
|
- API keys (LLM, Brave Search) encrypted at rest with AES-256-GCM using a master encryption key.
|
|
- SSRF prevention in the scraper (rejects private/loopback IPs).
|
|
- CSRF protection via `X-Requested-With` header validation.
|
|
- Session-based authentication with HttpOnly/SameSite cookies.
|
|
|
|
### 5.2 Performance
|
|
|
|
- Configurable rate limiting for LLM API calls (per-user override or admin default).
|
|
- Batched parallel scraping and classification to maximize throughput.
|
|
- Windowed source extraction to avoid unnecessary work when the synthesis fills early.
|
|
- Source diversity cap to prevent a single domain from dominating results.
|
|
- Article history deduplication to avoid re-processing previously seen articles.
|
|
- 15-minute generation timeout.
|
|
|
|
### 5.3 Self-Hosted
|
|
|
|
- Single Docker Compose deployment (application + PostgreSQL).
|
|
- No external dependencies beyond user-provided API keys and the Resend email service.
|
|
- Single-tenant: one instance per deployment.
|
|
- Users bring their own LLM API keys (no shared API key).
|
|
|
|
### 5.4 Internationalization
|
|
|
|
- i18n-ready architecture (all UI strings externalized).
|
|
- French is the only language currently supported.
|
|
|
|
### 5.5 Reliability
|
|
|
|
- Hourly session cleanup background task.
|
|
- Job store with TTL for expired generation jobs.
|
|
- Scheduled generation with double-run prevention (`last_run_at` tracking).
|
|
- Panic recovery and timeout handling for generation tasks.
|