177 Commits (a09973f5699472148c3f3cccbfd80579b528da4b)

Author SHA1 Message Date
oabrivard 1cb7bf6c6f test: add integration test for RSS feed discovery and persistence in pipeline
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard e2ce401ea6 fix: add body size limit to feed_parser to prevent memory exhaustion
Adds chunked reading with a 5 MB cap (matching the scraper limit) to
both parse_feed and discover_feed, with fast rejection via Content-Length
header when available. Includes a unit test covering the oversize path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard 7d3dfa37a9 fix: add user_id ownership check to update_source_rss
Adds `AND user_id = $4` to the UPDATE query in `update_source_rss` and
threads the `user_id` parameter through from `run_generation_inner`,
consistent with every other mutation in db/sources.rs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard 7e1ab0996b test: add end-to-end RSS flow test for feed_parser
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard 027c576302 feat: integrate feed_parser into Phase 1 pipeline with HTML fallback
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard d4cbcf47ae feat: add detect_and_parse_feed orchestration function
Adds detect_and_parse_feed which orchestrates feed caching/freshness logic:
uses cached feed URL directly if fresh (< 30 days), otherwise re-discovers
from source URL via discover_feed. Returns FeedResult::Found or NotFound.
Includes 4 new tests covering fresh cache, no cache, no feed, and stale cache cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard 96b39814bb feat: add discover_feed function for RSS/Atom auto-discovery
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard fcdc7ca4a6 feat: add feed_parser service with parse_feed function and tests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard cd5f5434b2 feat: add rss_url and rss_discovered_at columns to sources
Add nullable rss_url (TEXT) and rss_discovered_at (TIMESTAMPTZ) columns
to the sources table for RSS feed URL caching. Update the Source struct,
all query_as SELECT/RETURNING queries, and add update_source_rss db function.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard ef23596c5a deps: add feed-rs crate for RSS/Atom feed parsing 2 months ago
oabrivard da8603c57c fix: allow Turnstile connect-src in CSP to prevent hanging requests
The CSP had connect-src 'self' which blocked Cloudflare Turnstile's
internal fetch requests to challenges.cloudflare.com, causing them to
hang indefinitely and triggering a page reload loop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 6d5dd23a6b fix: SPA fallback returns 200 instead of 404 for client-side routes
ServeDir::not_found_service serves index.html but preserves the 404
status code. Switch to ServeDir::fallback which returns 200, fixing
client-side routes like /login returning 404 to the browser.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 1a6773b159 fix: fresh Turnstile token for resend + improved magic link logging
Turnstile tokens are single-use. The resend flow reused the consumed token,
causing "timeout-or-duplicate" errors from Cloudflare.

Frontend:
- Add Turnstile widget to resend view on Login and Register pages
- Add resetSignal prop to Turnstile component to re-solve after each resend
- Clear token after each successful API call, guard resend against null token
- Add test for resetSignal behavior

Backend:
- Add entry log when magic link email sending begins
- Add explicit warning when rate limit prevents sending
- Add error log with rollback context when email delivery fails

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard bc68434ed8 fix: pass Turnstile sitekey to frontend Docker build
The frontend Vite build was not receiving VITE_TURNSTILE_SITE_KEY during
Docker builds, causing the production bundle to fall back to the Cloudflare
test sitekey (1x00000000000000000000AA) which returns 503 in production.

- Add ARG/ENV for VITE_TURNSTILE_SITE_KEY in Dockerfile frontend stage
  (placed after npm ci to preserve dependency cache)
- Pass TURNSTILE_SITE_KEY from .env as build arg in docker-compose.yml
- Add post-change workflow section to CLAUDE.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 0963559e0f fix: resolve all clippy warnings in test files (zero warnings remaining)
- api_auth_test.rs: replace len() > 0 with !is_empty() (3 occurrences)
- api_admin_test.rs: suppress type_complexity on complex tuple Vec annotation
- api_sources_preferred_test.rs: replace assert_eq!(x, false) with assert!(!x)
- api_sources_test.rs: remove needless & on format!() in .uri() calls (5 occurrences)
- api_syntheses_test.rs: remove needless & on format!() in .uri() call

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard 9a734f136e fix: resolve all clippy warnings (0 remaining)
- db/themes: pass CreateThemeRequest/UpdateThemeRequest structs instead
  of 8-9 individual parameters
- llm/mock: add Default impl for MockLlmProvider
- middleware/auth: suppress manual_async_fn (Axum extractor constraint)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 3d790e7ce7 feat: extract article URLs from JSON-LD structured data in source pages
Many modern sites (Hugo, WordPress, Next.js) load articles via JavaScript
but include full article URLs in JSON-LD schema.org markup in the <head>.
The scraper now extracts these first (highest quality), then falls back
to <a href> heuristic extraction. Supports ItemList, BlogPosting,
NewsArticle, @graph arrays, and mainEntity wrappers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 9a310bbf19 feat: add French/European/US date formats + remove "Articles sans date" category
Date parser now supports: 25/03/2026, 25-03-2026, March 25 2026,
25 mars 2026, 15 février 2026, and short month variants.

Articles without dates are no longer routed to a separate category —
they stay in their LLM-classified category with date shown as empty.
This prevents losing good articles in a catch-all section.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard f0b60f3f13 fix: return 204 No Content from preferred sources endpoint
The API client expects empty responses to use 204, not 200.
Returning 200 with no body caused JSON parse error in the frontend.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 2822baf50d fix: add theme_id to preferred sources pipeline test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 6cf2b9f5a4 fix: update sources integration tests for multi-theme (add theme_id everywhere)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 68b1956059 refactor: extract synthesis helpers (assign_category, filter_phase2_url, tracing) into helpers.rs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard b124d73c2a fix: P1 audit items — CSV export theme filter, theme validation, ownership checks, history enums, i18n
- export_csv now accepts optional theme_id query param and filters accordingly
- Add UpdateThemeRequest::validate() with bounds checking; call it in the update handler
- Verify theme ownership in sources::create when theme_id is provided
- Update STATUS_OPTIONS (add filtered_too_old, filtered_not_article; remove filtered_duplicate) and SOURCE_TYPE_OPTIONS (add brave_search; remove overflow) in ArticleHistory
- Replace hardcoded French strings ('Confirmer', 'Erreur inconnue') with t() calls; add settings.apiKeys.unknownError key to fr.ts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard d5d624b896 fix: P0 audit bugs — theme-scoped imports/preferred, creation flow, scheduler timeout, job cleanup
- Bulk/CSV import now passes theme_id through to DB
- Preferred source update scoped by theme_id (no cross-theme reset)
- Theme creation sends sensible defaults from frontend
- Scheduler wraps generation in 15-minute timeout
- Job store cleanup runs every 5 minutes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard fa793de8bf test: add scheduler unit test and find_due_schedules integration tests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard 7f647bc656 refactor: extract JobStore to services/job_store.rs
Moves JobEntry, JobStore, ProgressEvent, JOB_TTL, and emit_progress
to a dedicated module. Updates imports in synthesis.rs, generation.rs,
scheduler.rs, and app_state.rs. synthesis.rs re-exports for backward
compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 1ab9c817e4 refactor: extract scrape_and_classify_batch from synthesis pipeline
Replaces ~200 duplicated lines in Phase 1 (personalized sources) and
Phase 2 (Brave Search) with a shared scrape_and_classify_batch function.
Uses ScrapeClassifyCtx to bundle shared parameters. Also prepares
synthesis.rs for JobStore extraction by re-exporting from job_store.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 7d58b6e019 fix: enable article history in preferred sources test
article_history_days=0 disables "used" trace entries, so the test
found 0 entries. Changed to 90 to enable tracing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard c16c588752 test: add stop generation, LLM logs, and preferred ordering tests (GAP-1,2,3)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard e97b01c819 test: add schedule CRUD integration tests and E2E
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard a068d04fa8 feat: add background scheduler for automated synthesis generation
Spawns a tokio task that checks for due schedules every 60 seconds,
runs generation via run_generation_inner, and sends emails to configured
recipients before marking each schedule as run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard 384649b2b6 feat: add theme schedules — model, DB, CRUD handler, routes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard e43a4d2180 feat: add preferred sources — prioritized during synthesis generation
Users can mark sources as preferred via star buttons on the theme page.
Preferred sources are processed first in the pipeline (ordered before
non-preferred in waves, shuffled separately then merged).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 78844c4ebe chore: remove unused test_settings() function from prompts.rs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 6f3e6883c9 feat: add stop generation button — saves partial synthesis on cancel
Adds Arc<AtomicBool> cancellation flag to JobStore/JobEntry. The pipeline
checks the flag before each wave and after each batch, then saves whatever
articles have been collected. A new POST /syntheses/generate/:job_id/stop
endpoint sets the flag. The frontend shows a red stop button during generation
and POSTs to the stop endpoint on click.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard 7ba1da4d92 fix: use batch_size=1 for diversity test + rename Autre to Divers in overflow test
Diversity filter works across batches (source_counts updated after classify).
With batch_size=5, all 3 articles fit in one batch, bypassing the filter.
batch_size=1 forces per-article processing so the filter triggers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard e444f79c0b fix: mock provider returns today's date in classify response
Without a date, articles are routed to "Articles sans date" instead
of their classified category, breaking pipeline tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 20b298b926 fix: update syntheses tests for theme_id requirement in generate endpoint
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard da602d850d test: add pipeline tests for source diversity and history dedup (GAP-05, GAP-07)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 14908cf603 test: add themes CRUD, article history, and assign_category tests
Covers GAP-01 (themes API), GAP-02 (article history API), and
GAP-04 (assign_category unit tests).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 196005a27b feat: multi-theme Phase 1 — settings split, sources/syntheses theme_id, pipeline theme-aware
Remove content settings from settings table (moved to themes).
Add theme_id to sources and syntheses. Pipeline loads content
settings from the selected theme. Generate endpoint requires theme_id.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 10b8d950b9 feat: add themes CRUD endpoints
Implements GET/POST/PUT/DELETE /api/v1/themes handlers following the same patterns as sources.rs, registers the module, and wires up routes in the router.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 467ad550a5 feat: add Theme model and DB queries
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard cad61fadfc feat: create themes table and migrate content settings from settings
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard 6b84c335d0 feat: improve synthesis list cards with time, all categories, and uniform height
- Add generation time below date in synthesis cards
- Show all categories with article count in parentheses
- Use flex-col layout for uniform card height
- Add sections_summary to SynthesisListItem API response
- Add formatTime utility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard a89c61c5b6 feat: add "Articles sans date" category for articles without publication date
Articles where neither the scraper nor the LLM could extract a date
are now placed in a separate "Articles sans date" section instead of
their classified category. This makes undated articles visible without
mixing them with properly dated content.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard fb086a706f feat: rename fallback category "Autre" to "Divers"
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard e24236a069 feat: add max_links_per_source setting (default 8, was hardcoded 15)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard 2c3c6008a3 fix: monotonic progress bar with 3 clean phases (sources, websearch, saving)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard d234fa9b24 feat: add is_article LLM check + remove use_llm_for_source_links option
The LLM now determines if scraped content is a real article during
classify (zero extra cost). The separate LLM link extraction option
is removed — heuristic extraction is sufficient.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago