14 KiB
AI Weekly Synth -- Functional Specification
1. User Journeys
1.1 Registration
- User navigates to the registration page and enters their email and optional display name.
- A Cloudflare Turnstile captcha is completed.
- The system sends a magic link email.
- User clicks the link to verify their account and is logged in automatically.
- A 30-day session cookie is set. The user is redirected to the home page.
1.2 Login
- User enters their email on the login page.
- A Turnstile captcha is completed.
- A magic link is sent. The user clicks it and is authenticated.
- If the link expires or is invalid, the user is prompted to request a new one.
1.3 Configure a Theme
- User navigates to "Personnaliser les syntheses" (theme management page).
- User selects an existing theme from the dropdown or clicks "Creer un nouveau theme".
- The theme form shows:
- Name: a display label for the theme.
- Search topic: the subject the AI uses to search for news (e.g. "Intelligence Artificielle").
- Categories: an ordered list of user-defined category names. Themes can be created without user-defined categories. The system always includes
Divers(overflow) andSans date(undated articles). - Max age (days): how old articles can be.
- Max items per category: cap per category.
- Summary length: slider with three positions -- Court (3-4 lines), Moyen (6-8 lines), Detaille (12-15 lines).
- User saves the theme.
1.4 Add Personalized Sources
- On the theme management page, below theme settings, the sources section shows sources scoped to the selected theme.
- User adds sources individually (title + URL) or via:
- CSV import: upload a
.csvfile withTitre,URLcolumns. Auto-detects comma/semicolon delimiters, skips header rows, prependshttps://to bare URLs. Import is always applied to the selected theme. - Bulk text import: paste multiple sources in
Nom;URLformat, one per line. Import is always applied to the selected theme. - CSV export: download sources for the selected theme only.
- CSV import: upload a
- Sources can be marked as preferred (prioritaire) via checkboxes. Preferred sources are scoped per theme and do not affect other themes.
- Sources can be deleted individually.
1.5 Generate a Synthesis
- User navigates to "Nouvelle Synthese" and selects a theme from the dropdown.
- The page shows the active provider and model.
- User clicks "Lancer la generation".
- Progress is streamed in real-time via SSE. The page shows the current step:
- "Sources personnalisees" (Phase 1)
- "Recherche web" (Phase 2)
- "Sauvegarde" (final step)
- User can leave the page; generation continues in the background.
- User can stop the generation early; articles collected so far are saved.
- On completion, user is redirected to the synthesis detail page.
1.6 View a Synthesis
- The home page lists all syntheses as cards, showing the week number, theme name badge, a preview of articles, and article count.
- Syntheses can be sorted by date or by theme.
- Clicking a card opens the detail page.
- The detail page displays sections (categories) with article titles, summaries, and links. Two display modes are available: compact and full.
- From the detail page, the user can:
- View article provenance (history of candidate articles processed during generation).
- View LLM call logs (every AI call made during generation, with prompts, responses, and timing).
1.7 Export a Synthesis
From the synthesis detail page:
- Email: enter a recipient address or click "S'envoyer a soi-meme". The synthesis is sent as a formatted email via Resend.
- Markdown: download as a
.mdfile. - PDF: download as a
.pdffile.
2. Feature Details
2.1 Multi-Theme
Each user can create multiple themes. A theme groups together:
- Content settings (search topic, categories, max items, max age, summary length)
- Personalized sources
- Generated syntheses
Themes are fully independent. Deleting a theme preserves its existing syntheses (displayed with a "Theme supprime" badge).
The generate page requires selecting a theme before launching. The home page shows a theme badge on each synthesis card and supports sorting by theme.
2.2 Categories
Categories are user-defined per theme. Users add and remove category names in the theme editor after creating a theme.
The system always includes two default categories:
Divers: overflow category for unmatched or full categories.Sans date: category for articles without a usable publication date.
If no user-defined categories are configured, the available categories are still Divers and Sans date.
2.3 Preferred Sources
Sources can be marked as preferred. Preference is stored per theme. During generation, preferred sources are extracted and processed before non-preferred sources. Within each extraction wave, URLs from preferred sources are also shuffled and placed before other URLs. This maximizes the chance that articles from preferred sources fill the synthesis.
2.4 Scheduled Generation
Each theme can have an optional schedule with:
- Enabled/disabled toggle
- Days: selection of days of the week (Mon-Sun)
- Time: execution time in UTC (HH:MM)
- Email recipients: up to 3 email addresses
When a schedule fires, the system generates the synthesis and emails it to all listed recipients. Schedules are checked every 60 seconds. A last_run_at timestamp prevents double-runs on the same day. Jobs run sequentially to avoid overwhelming LLM rate limits.
Changes to the schedule are saved immediately (auto-save).
2.5 Brave Search
An optional alternative to LLM-powered web search in Phase 2. When enabled:
- The user provides a Brave Search API key (stored encrypted alongside LLM keys).
- Phase 2 queries the Brave Search API with the theme topic, filtered by article freshness.
- Results are scraped and classified/summarized by the LLM, following the same pipeline as Phase 1.
When the Brave key is deleted, the toggle is automatically disabled. If the toggle is on but no key is present at generation time, the system returns an error.
3. Generation Pipeline
3.1 Overview
Generation follows a two-phase pipeline. Phase 1 processes the user's personalized sources. Phase 2 fills remaining category gaps via web search. Both phases produce articles classified into user-defined categories with titles, summaries, and source URLs.
3.2 Initialization
Before generation starts:
- Load theme settings (user-defined categories plus defaults
DiversandSans date, search topic, max items, max age, summary length) and global user settings (provider, models, batch size, rate limits, etc.). - Decrypt the user's LLM API key and create the provider instance.
- Clean up old article history and LLM call logs.
- Load personalized sources for the selected theme.
- Initialize tracking: per-category article counts, per-domain source counts, seen URLs set, article history hashes.
3.3 Phase 1: Personalized Sources
Skipped if the user has no sources for the theme.
Step 1 -- Windowed source extraction:
Sources are split into waves of source_extraction_window size (default 3). Sources are rotated so extraction starts after the last source used in a previous generation (rolling window). Preferred sources are placed before non-preferred sources within the rotation order.
For each wave:
- Extract article links from all sources in the wave in parallel (bounded concurrency of 5). Link extraction uses HTML
<a>tag parsing. - Deduplicate candidate URLs and filter against article history (previously seen articles are skipped).
- Shuffle remaining candidates, with URLs from preferred sources placed first.
- Process articles in batches of
batch_size:- Scrape: fetch article pages in parallel. Validate content (reject empty pages, soft-404s, pages that are too old). Extract original title from
og:title,<h1>, or<title>. - Classify/summarize: send article content to the LLM. The LLM assigns a category and generates a title and summary. Summary length varies based on the
summary_lengthsetting (more detail = more article body sent to the LLM).
- Scrape: fetch article pages in parallel. Validate content (reject empty pages, soft-404s, pages that are too old). Extract original title from
- Check if the synthesis is full (total articles across all categories reaches the cap). If full, skip remaining waves.
Source diversity: a per-domain cap (max_articles_per_source) prevents any single source from dominating.
3.4 Phase 2: Web Search Fallback
Skipped if all user-defined categories are already filled.
The system computes category gaps (how many articles each category still needs), then follows one of two paths:
Path A -- Brave Search (when use_brave_search is enabled):
- Query the Brave Search API with the theme topic and freshness filter.
- Filter results: reject homepage URLs, deduplicate against Phase 1, check article history, apply source diversity cap.
- Scrape and classify/summarize results using the same batched pipeline as Phase 1.
Path B -- LLM Web Search (default):
- Send a search prompt to the LLM with the theme, categories, and gap counts. The LLM uses web grounding to find articles and returns structured results.
- Filter results using the same filters as Path A.
- Scrape each result to validate it. Keep the LLM-provided title and summary (no re-classification).
3.5 Finalization
- If no articles were collected across both phases, return an error.
- Order sections: user-defined categories first (in their configured order), then
Diversif non-empty, thenSans dateif non-empty. - Save the synthesis to the database with status "completed".
- Record all used articles in article history for future deduplication.
For the complete technical algorithm, see technical_specs.md Section 5.
4. Settings Overview
4.1 Per-Theme Settings
Managed on the theme management page. Each theme has its own values.
| Setting | Description | Default |
|---|---|---|
| Name | Display label for the theme | -- |
| Search topic | Subject for AI search queries | -- |
| Categories | Ordered list of user-defined category names (Divers and Sans date are always included by the system) |
[] |
| Max age (days) | Article recency filter | 7 |
| Max items per category | Cap per category | 4 |
| Summary length | Detail level: 1=Court, 2=Moyen, 3=Detaille | 3 |
4.2 Global User Settings
Managed on the settings page. Apply across all themes.
| Setting | Description | Default |
|---|---|---|
| Provider | LLM provider (Gemini, OpenAI, Anthropic) | -- |
| Research model | Model for scraping/classification | Admin default |
| Web search model | Model for web search | Admin default |
| Search agent behavior | Custom instructions for AI research | Default prompt |
| Use Brave Search | Enable Brave Search for Phase 2 | false |
| Batch size | Articles processed in parallel | 5 |
| Source extraction window | Sources per extraction wave | 3 |
| Max articles per source | Per-domain diversity cap | -- |
| Max links per source | Links extracted per source page | 15 |
| Rate limit (max requests) | LLM call throttling | Admin default |
| Rate limit (time window) | Throttling window | Admin default |
| Article history (days) | History retention period | -- |
4.3 Settings Import/Export
Users can export their global settings as a JSON file and import settings from a previously exported file. Import merges uploaded values over defaults; missing fields fall back to default values. API keys can optionally be included in the export (with a warning that they will be in plaintext).
5. Admin Features
5.1 Provider Management
Admins configure which LLM providers and models are available to users:
- Add providers with a unique identifier and display name.
- For each provider, configure two model lists: scraping/extraction models and web search models.
- Set a default model for each category.
- Enable or disable providers.
- Delete providers entirely.
Users select from the admin-curated list. If a user's selected provider is removed, they see a warning to select another.
5.2 Rate Limit Configuration
Admins set default rate limits per provider (max requests / time window in seconds). These defaults apply to users who have not overridden the values in their own settings.
5.3 User Management
Admins can:
- View all registered users (email, name, role, registration date).
- Promote a user to admin or demote an admin to user.
- Admins cannot modify their own role.
6. Export and Sharing
6.1 Email
From the synthesis detail page, users enter a recipient email address or click "S'envoyer a soi-meme" to use their own address. The synthesis is sent as a formatted HTML email via the Resend API.
Scheduled generation also sends emails automatically to up to 3 configured addresses per theme.
6.2 PDF
A PDF export is available from the synthesis detail page. The PDF contains all sections with article titles, summaries, and source URLs.
6.3 Markdown
A Markdown export is available from the synthesis detail page. The file can be saved or pasted into other tools.
7. Article History and Observability
7.1 Article History
Every article encountered during generation is recorded in the article history with its status:
- used: included in the final synthesis.
- filtered_history: skipped because it was seen in a previous generation.
- filtered_diversity: skipped due to per-domain cap.
- filtered_empty: scrape returned no content or a soft-404.
- filtered_too_old: article older than the max age setting.
- filtered_homepage: URL was a homepage, not a specific article.
- filtered_cross_phase_dedup: URL already seen in a previous phase.
Users can view the article history per synthesis (provenance view) or globally. History can be cleared entirely.
7.2 LLM Call Logs
Every LLM call during generation is logged with:
- Call type (link extraction, classify/summarize, web search)
- Model used
- System prompt and user prompt
- Response
- Duration
- Associated article URL (for classify calls)
Logs are viewable per synthesis from the detail page.