334 Commits (master)
 

Author SHA1 Message Date
oabrivard a9b60648ee feat: make generation timeout configurable via GENERATION_TIMEOUT_MINUTES
The hardcoded 15-minute timeout was too short for some syntheses.
Now configurable via env var with a default of 30 minutes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 1bac084d98 test: add integration test for site_search fallback in pipeline
Verifies that when a source page returns no article links (blocked/empty),
the pipeline does not crash and still produces article_history entries via
the site_search fallback path or Phase 2.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard 75ab2470f2 test: add unit tests for site_search Brave and LLM paths
Add 3 tests covering: Brave error path returns empty vec (no panic),
LLM integration with MockLlmProvider returns empty (non-array response),
and prompt construction contains domain/theme/max_results/max_age_days.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard 926c7ec709 feat: integrate site_search fallback into Phase 1 pipeline
Build SiteSearchProvider before wave loop, chain as third fallback
after RSS + HTML when both return 0 links.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 71c791fec0 feat: implement LLM websearch path in site_search service
Replace placeholder search_llm with real implementation: builds a French
prompt asking the LLM for recent articles from a domain, calls call_llm
with a JSON-array schema, and filters results through url_matches_domain
to guard against hallucinated URLs. Add build_site_search_prompt and
parse_llm_url_response helpers with 4 unit tests (valid array, non-array,
mixed types, wrong-domain filtering).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard a4f008bc42 feat: add site_search service with Brave path and domain filtering
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard c45050ce3c docs: add site search fallback implementation plan
6-task plan: site_search service (Brave + LLM paths), pipeline
integration as third fallback after RSS + HTML, tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard a09973f569 docs: add site search fallback design spec
Spec for automatic site:{domain} search fallback when RSS + HTML
extraction both return 0 links for a personalized source. Uses
Brave Search or LLM websearch. Inline in Phase 1 spawn.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 1cb7bf6c6f test: add integration test for RSS feed discovery and persistence in pipeline
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard e2ce401ea6 fix: add body size limit to feed_parser to prevent memory exhaustion
Adds chunked reading with a 5 MB cap (matching the scraper limit) to
both parse_feed and discover_feed, with fast rejection via Content-Length
header when available. Includes a unit test covering the oversize path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard 7d3dfa37a9 fix: add user_id ownership check to update_source_rss
Adds `AND user_id = $4` to the UPDATE query in `update_source_rss` and
threads the `user_id` parameter through from `run_generation_inner`,
consistent with every other mutation in db/sources.rs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard 7e1ab0996b test: add end-to-end RSS flow test for feed_parser
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard 027c576302 feat: integrate feed_parser into Phase 1 pipeline with HTML fallback
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard d4cbcf47ae feat: add detect_and_parse_feed orchestration function
Adds detect_and_parse_feed which orchestrates feed caching/freshness logic:
uses cached feed URL directly if fresh (< 30 days), otherwise re-discovers
from source URL via discover_feed. Returns FeedResult::Found or NotFound.
Includes 4 new tests covering fresh cache, no cache, no feed, and stale cache cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard 96b39814bb feat: add discover_feed function for RSS/Atom auto-discovery
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard fcdc7ca4a6 feat: add feed_parser service with parse_feed function and tests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard cd5f5434b2 feat: add rss_url and rss_discovered_at columns to sources
Add nullable rss_url (TEXT) and rss_discovered_at (TIMESTAMPTZ) columns
to the sources table for RSS feed URL caching. Update the Source struct,
all query_as SELECT/RETURNING queries, and add update_source_rss db function.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard ef23596c5a deps: add feed-rs crate for RSS/Atom feed parsing 2 months ago
oabrivard 2e94057822 docs: add RSS feed integration implementation plan
7-task plan covering: feed-rs dependency, DB migration, feed_parser
service (parse, discover, orchestrate), pipeline integration, and tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 8e56fcdb3a docs: add RSS feed integration design spec
Spec for adding RSS/Atom feed support to personalized sources in Phase 1
of the synthesis pipeline — auto-discovery, persistence with 30-day
re-check, and fallback to HTML extraction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard f8588a57a3 fix: skip 401 redirect for auth endpoints to prevent login interference
The API client's 401 handler was intercepting responses from /auth/*
endpoints (login, register, me), throwing "Session expired" before the
actual response could reach the caller. This prevented the login form
from working — the AuthProvider's me() call returned 401, threw, and
the error propagated into the login flow.

Now the 401 redirect only triggers for non-auth API calls (where it
genuinely indicates an expired session). Auth endpoints handle their
own error responses normally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard d081c47c7f fix: prevent 401 redirect loop on login and register pages
The API client redirected to /login on any 401 response, including the
GET /auth/me call made by AuthProvider on the login page itself. This
caused an infinite hard-navigation reload loop.

Skip the redirect when already on /login or /register — the AuthContext
route guards handle unauthenticated routing for those pages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard da8603c57c fix: allow Turnstile connect-src in CSP to prevent hanging requests
The CSP had connect-src 'self' which blocked Cloudflare Turnstile's
internal fetch requests to challenges.cloudflare.com, causing them to
hang indefinitely and triggering a page reload loop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 6d5dd23a6b fix: SPA fallback returns 200 instead of 404 for client-side routes
ServeDir::not_found_service serves index.html but preserves the 404
status code. Switch to ServeDir::fallback which returns 200, fixing
client-side routes like /login returning 404 to the browser.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 1a6773b159 fix: fresh Turnstile token for resend + improved magic link logging
Turnstile tokens are single-use. The resend flow reused the consumed token,
causing "timeout-or-duplicate" errors from Cloudflare.

Frontend:
- Add Turnstile widget to resend view on Login and Register pages
- Add resetSignal prop to Turnstile component to re-solve after each resend
- Clear token after each successful API call, guard resend against null token
- Add test for resetSignal behavior

Backend:
- Add entry log when magic link email sending begins
- Add explicit warning when rate limit prevents sending
- Add error log with rollback context when email delivery fails

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard b08d65c53c fix: use fresh Turnstile token for resend on Login and Register pages
Turnstile tokens are single-use. The resend flow was reusing the consumed
token from the initial submission, causing "timeout-or-duplicate" errors.

- Add Turnstile widget to the resend view so a fresh token is obtained
- Add resetSignal prop to Turnstile component to re-solve after each resend
- Clear token after each successful API call to prevent stale reuse
- Guard handleResend against null token
- Add test for resetSignal behavior

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard d2851c019e fix: add timeout to Turnstile polling loop to prevent infinite retries
When the Cloudflare Turnstile script fails to load (e.g., 503 from CDN),
the polling interval ran forever, causing the page to appear stuck in a
refresh loop. Now stops after 100 attempts (10s) and calls onError.

Also adds dedicated unit tests for the Turnstile component covering
immediate render, delayed load, timeout, and cleanup-during-polling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard bc68434ed8 fix: pass Turnstile sitekey to frontend Docker build
The frontend Vite build was not receiving VITE_TURNSTILE_SITE_KEY during
Docker builds, causing the production bundle to fall back to the Cloudflare
test sitekey (1x00000000000000000000AA) which returns 503 in production.

- Add ARG/ENV for VITE_TURNSTILE_SITE_KEY in Dockerfile frontend stage
  (placed after npm ci to preserve dependency cache)
- Pass TURNSTILE_SITE_KEY from .env as build arg in docker-compose.yml
- Add post-change workflow section to CLAUDE.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 7c22647d89 Updated port mapping in docker config 2 months ago
oabrivard 4fdc17917d fix: bind app to localhost:8005 for Caddy reverse proxy
Host port changed from 8080 to 8005 and bound to 127.0.0.1 only
so traffic goes through Caddy (HTTPS) instead of being exposed directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard ab643c8e4c Fixe Markdown lint issues 2 months ago
oabrivard ad613aa001 fix: resolve all markdownlint errors (blank lines, table spacing, bare URLs)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard 0963559e0f fix: resolve all clippy warnings in test files (zero warnings remaining)
- api_auth_test.rs: replace len() > 0 with !is_empty() (3 occurrences)
- api_admin_test.rs: suppress type_complexity on complex tuple Vec annotation
- api_sources_preferred_test.rs: replace assert_eq!(x, false) with assert!(!x)
- api_sources_test.rs: remove needless & on format!() in .uri() calls (5 occurrences)
- api_syntheses_test.rs: remove needless & on format!() in .uri() call

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard 9a734f136e fix: resolve all clippy warnings (0 remaining)
- db/themes: pass CreateThemeRequest/UpdateThemeRequest structs instead
  of 8-9 individual parameters
- llm/mock: add Default impl for MockLlmProvider
- middleware/auth: suppress manual_async_fn (Axum extractor constraint)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard c6aa1afdc5 refactor: decompose ThemeSourceList into SourceAddForm + SourceImport
ThemeSourceList: 477 → 222 lines (source list + preferred + delete)
SourceAddForm: 114 lines (title + URL form)
SourceImport: 186 lines (CSV import/export + bulk text import)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 3d790e7ce7 feat: extract article URLs from JSON-LD structured data in source pages
Many modern sites (Hugo, WordPress, Next.js) load articles via JavaScript
but include full article URLs in JSON-LD schema.org markup in the <head>.
The scraper now extracts these first (highest quality), then falls back
to <a href> heuristic extraction. Supports ItemList, BlogPosting,
NewsArticle, @graph arrays, and mainEntity wrappers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 9a310bbf19 feat: add French/European/US date formats + remove "Articles sans date" category
Date parser now supports: 25/03/2026, 25-03-2026, March 25 2026,
25 mars 2026, 15 février 2026, and short month variants.

Articles without dates are no longer routed to a separate category —
they stay in their LLM-classified category with date shown as empty.
This prevents losing good articles in a catch-all section.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 42ced9cfee fix: swap date and source URL positions in article cards 2 months ago
oabrivard 598211167d feat: show source URL next to date in synthesis article cards
Date aligned left, source URL aligned right. URL stripped of protocol
and truncated to 40 chars with "..." if too long. Full URL on hover.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 48cad8144b fix: show schedule panel before sources in theme settings
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard f0b60f3f13 fix: return 204 No Content from preferred sources endpoint
The API client expects empty responses to use 204, not 200.
Returning 200 with no body caused JSON parse error in the frontend.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard a566a60663 Code improvement after a code review with Codex 2 months ago
oabrivard 2822baf50d fix: add theme_id to preferred sources pipeline test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 6cf2b9f5a4 fix: update sources integration tests for multi-theme (add theme_id everywhere)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 0a0684e42e refactor: decompose ThemeManager into ThemeContentForm + ThemeSourceList sub-components
Extract content settings card and sources card into dedicated components,
reducing ThemeManager from 938 to 233 lines while keeping theme list CRUD
and selector in the parent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 68b1956059 refactor: extract synthesis helpers (assign_category, filter_phase2_url, tracing) into helpers.rs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard b60a55993c fix: P2 audit items — use API client for stop, replace raw buttons, remove deprecated doc refs
- Replace raw fetch in handleStop with synthesesApi.stop()
- Add stop() method to synthesesApi
- Replace raw <button> elements in GenerateSynthesis with Button component (generate, retry, stop)
- Remove deprecated LLM link extraction schema reference from technical_specs.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard b124d73c2a fix: P1 audit items — CSV export theme filter, theme validation, ownership checks, history enums, i18n
- export_csv now accepts optional theme_id query param and filters accordingly
- Add UpdateThemeRequest::validate() with bounds checking; call it in the update handler
- Verify theme ownership in sources::create when theme_id is provided
- Update STATUS_OPTIONS (add filtered_too_old, filtered_not_article; remove filtered_duplicate) and SOURCE_TYPE_OPTIONS (add brave_search; remove overflow) in ArticleHistory
- Replace hardcoded French strings ('Confirmer', 'Erreur inconnue') with t() calls; add settings.apiKeys.unknownError key to fr.ts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard d5d624b896 fix: P0 audit bugs — theme-scoped imports/preferred, creation flow, scheduler timeout, job cleanup
- Bulk/CSV import now passes theme_id through to DB
- Preferred source update scoped by theme_id (no cross-theme reset)
- Theme creation sends sensible defaults from frontend
- Scheduler wraps generation in 15-minute timeout
- Job store cleanup runs every 5 minutes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 58f42d0a87 docs: remove redundancy across documentation — cross-references instead of duplication
Trim architecture.md significantly (section 1 overview, technology stack, deployment topology,
module inventory lists, LLM trait block, pipeline details, data model table, full API tables,
background task list). Replace section 5 API tables with a one-liner. Requirements.md sections
3.1/3.5/3.6/3.7/3.8 and 4.2 condensed with cross-references. deployment.md security feature
list replaced by cross-reference to architecture.md Section 6. functional_specs.md Section 3
gains a cross-reference to technical_specs.md Section 5.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago