You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
113 lines
5.6 KiB
Markdown
113 lines
5.6 KiB
Markdown
# AI Weekly Synth -- Requirements
|
|
|
|
## 1. Product Vision
|
|
|
|
AI Weekly Synth is a self-hosted web application that generates AI-powered weekly news syntheses. Users define topics of interest (themes), add personalized sources, and let the application search the web, validate articles, and produce structured summaries organized by category.
|
|
|
|
The application is designed for individuals or small teams who want an automated, curated news digest without relying on third-party newsletter services.
|
|
|
|
## 2. Target Users
|
|
|
|
- **End users**: professionals or enthusiasts who follow one or more topics (e.g. AI, cybersecurity, finance) and want a weekly summary delivered by email or available on-demand.
|
|
- **Administrators**: the instance operator who manages available LLM providers, rate limits, and user accounts.
|
|
|
|
## 3. Core Features
|
|
|
|
### 3.1 Multi-Theme Support
|
|
|
|
Users create multiple independent themes, each with its own search topic, categories, personalized sources, and content settings. Syntheses are generated and tagged per theme. Deleting a theme preserves its existing syntheses.
|
|
|
|
See `functional_specs.md` Section 2 for detailed behavior.
|
|
|
|
### 3.2 Synthesis Generation
|
|
|
|
- On-demand generation triggered by the user for a selected theme.
|
|
- Two-phase pipeline:
|
|
- **Phase 1 (Personalized Sources)**: extracts article links from user-configured sources, scrapes content, classifies and summarizes each article into the theme's categories.
|
|
- **Phase 2 (Web Search Fallback)**: fills remaining category gaps using either Brave Search API or LLM-powered web search.
|
|
- Real-time progress streaming via SSE so the user can monitor generation status.
|
|
- Generation is capped at 15 minutes with automatic timeout.
|
|
|
|
### 3.3 Scheduled Generation
|
|
|
|
- Users configure a per-theme schedule: selected days of the week, time (UTC), and up to 3 email recipients.
|
|
- The application runs scheduled jobs automatically in the background, generating the synthesis and emailing it to all configured recipients.
|
|
- No external cron required; the scheduler is an internal background task.
|
|
|
|
### 3.4 Personalized Sources
|
|
|
|
- Users add web sources (blogs, news sites) per theme.
|
|
- Sources can be imported in bulk via text input, CSV upload, or added individually, always bound to the selected theme.
|
|
- Sources can be exported as CSV, always scoped to the selected theme.
|
|
- Sources can be marked as **preferred** (prioritized during generation -- processed before non-preferred sources), with preference state scoped per theme.
|
|
|
|
### 3.5 Brave Search Integration
|
|
|
|
Optional alternative to LLM web search for Phase 2. Users provide their own Brave Search API key; when enabled, Phase 2 queries Brave instead of using LLM web grounding. See `functional_specs.md` Section 2.5.
|
|
|
|
### 3.6 Export and Sharing
|
|
|
|
Syntheses can be exported as email (via Resend), PDF, or Markdown. See `functional_specs.md` Section 6.
|
|
|
|
### 3.7 Settings
|
|
|
|
Settings are split into two levels: per-theme content settings (search topic, categories, max age, max items, summary length) and global pipeline settings (LLM provider/model, Brave Search, batch size, rate limits, article history retention, import/export). See `functional_specs.md` Section 4 for the complete settings reference.
|
|
|
|
### 3.8 Authentication
|
|
|
|
Passwordless authentication via magic link emails with Cloudflare Turnstile captcha. Sessions use 30-day HttpOnly/SameSite cookies. See `architecture.md` Section 6 for the full security model.
|
|
|
|
## 4. User Roles
|
|
|
|
### 4.1 User (default)
|
|
|
|
- Register and log in via magic link.
|
|
- Create and manage themes (CRUD).
|
|
- Add and manage personalized sources per theme.
|
|
- Configure generation settings and API keys.
|
|
- Generate syntheses on demand or via schedule.
|
|
- View, delete, and export syntheses.
|
|
- View article history and LLM call logs per synthesis.
|
|
|
|
### 4.2 Admin
|
|
|
|
All user capabilities, plus provider management (add/edit/enable/disable LLM providers and models), rate limit configuration (defaults per provider), and user management (view all users, promote/demote roles). The first admin is created via the `create-admin` CLI command. See `functional_specs.md` Section 5.
|
|
|
|
## 5. Non-Functional Requirements
|
|
|
|
### 5.1 Security
|
|
|
|
- API keys (LLM, Brave Search) encrypted at rest with AES-256-GCM using a master encryption key.
|
|
- SSRF prevention in the scraper (rejects private/loopback IPs).
|
|
- CSRF protection via `X-Requested-With` header validation.
|
|
- Session-based authentication with HttpOnly/SameSite cookies.
|
|
|
|
### 5.2 Performance
|
|
|
|
- Configurable rate limiting for LLM API calls (per-user override or admin default).
|
|
- Batched parallel scraping and classification to maximize throughput.
|
|
- Windowed source extraction to avoid unnecessary work when the synthesis fills early.
|
|
- Source diversity cap to prevent a single domain from dominating results.
|
|
- Article history deduplication to avoid re-processing previously seen articles.
|
|
- 15-minute generation timeout.
|
|
|
|
### 5.3 Self-Hosted
|
|
|
|
- Single Docker Compose deployment (application + PostgreSQL).
|
|
- No external dependencies beyond user-provided API keys and the Resend email service.
|
|
- Single-tenant: one instance per deployment.
|
|
- Users bring their own LLM API keys (no shared API key).
|
|
|
|
### 5.4 Internationalization
|
|
|
|
- i18n-ready architecture (all UI strings externalized).
|
|
- French is the only language currently supported.
|
|
|
|
### 5.5 Reliability
|
|
|
|
- Hourly session cleanup background task.
|
|
- Job store with TTL for expired generation jobs.
|
|
- Scheduled generation with double-run prevention (`last_run_at` tracking).
|
|
- Panic recovery and timeout handling for generation tasks.
|
|
- Release gating in CI requires deterministic coverage for critical autonomous flows (notably scheduler execution and SSE progress behavior).
|