The hardcoded 15-minute timeout was too short for some syntheses.
Now configurable via env var with a default of 30 minutes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verifies that when a source page returns no article links (blocked/empty),
the pipeline does not crash and still produces article_history entries via
the site_search fallback path or Phase 2.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Build SiteSearchProvider before wave loop, chain as third fallback
after RSS + HTML when both return 0 links.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace placeholder search_llm with real implementation: builds a French
prompt asking the LLM for recent articles from a domain, calls call_llm
with a JSON-array schema, and filters results through url_matches_domain
to guard against hallucinated URLs. Add build_site_search_prompt and
parse_llm_url_response helpers with 4 unit tests (valid array, non-array,
mixed types, wrong-domain filtering).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds chunked reading with a 5 MB cap (matching the scraper limit) to
both parse_feed and discover_feed, with fast rejection via Content-Length
header when available. Includes a unit test covering the oversize path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds `AND user_id = $4` to the UPDATE query in `update_source_rss` and
threads the `user_id` parameter through from `run_generation_inner`,
consistent with every other mutation in db/sources.rs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds detect_and_parse_feed which orchestrates feed caching/freshness logic:
uses cached feed URL directly if fresh (< 30 days), otherwise re-discovers
from source URL via discover_feed. Returns FeedResult::Found or NotFound.
Includes 4 new tests covering fresh cache, no cache, no feed, and stale cache cases.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add nullable rss_url (TEXT) and rss_discovered_at (TIMESTAMPTZ) columns
to the sources table for RSS feed URL caching. Update the Source struct,
all query_as SELECT/RETURNING queries, and add update_source_rss db function.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The CSP had connect-src 'self' which blocked Cloudflare Turnstile's
internal fetch requests to challenges.cloudflare.com, causing them to
hang indefinitely and triggering a page reload loop.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ServeDir::not_found_service serves index.html but preserves the 404
status code. Switch to ServeDir::fallback which returns 200, fixing
client-side routes like /login returning 404 to the browser.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Turnstile tokens are single-use. The resend flow reused the consumed token,
causing "timeout-or-duplicate" errors from Cloudflare.
Frontend:
- Add Turnstile widget to resend view on Login and Register pages
- Add resetSignal prop to Turnstile component to re-solve after each resend
- Clear token after each successful API call, guard resend against null token
- Add test for resetSignal behavior
Backend:
- Add entry log when magic link email sending begins
- Add explicit warning when rate limit prevents sending
- Add error log with rollback context when email delivery fails
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The frontend Vite build was not receiving VITE_TURNSTILE_SITE_KEY during
Docker builds, causing the production bundle to fall back to the Cloudflare
test sitekey (1x00000000000000000000AA) which returns 503 in production.
- Add ARG/ENV for VITE_TURNSTILE_SITE_KEY in Dockerfile frontend stage
(placed after npm ci to preserve dependency cache)
- Pass TURNSTILE_SITE_KEY from .env as build arg in docker-compose.yml
- Add post-change workflow section to CLAUDE.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Many modern sites (Hugo, WordPress, Next.js) load articles via JavaScript
but include full article URLs in JSON-LD schema.org markup in the <head>.
The scraper now extracts these first (highest quality), then falls back
to <a href> heuristic extraction. Supports ItemList, BlogPosting,
NewsArticle, @graph arrays, and mainEntity wrappers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Date parser now supports: 25/03/2026, 25-03-2026, March 25 2026,
25 mars 2026, 15 février 2026, and short month variants.
Articles without dates are no longer routed to a separate category —
they stay in their LLM-classified category with date shown as empty.
This prevents losing good articles in a catch-all section.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The API client expects empty responses to use 204, not 200.
Returning 200 with no body caused JSON parse error in the frontend.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- export_csv now accepts optional theme_id query param and filters accordingly
- Add UpdateThemeRequest::validate() with bounds checking; call it in the update handler
- Verify theme ownership in sources::create when theme_id is provided
- Update STATUS_OPTIONS (add filtered_too_old, filtered_not_article; remove filtered_duplicate) and SOURCE_TYPE_OPTIONS (add brave_search; remove overflow) in ArticleHistory
- Replace hardcoded French strings ('Confirmer', 'Erreur inconnue') with t() calls; add settings.apiKeys.unknownError key to fr.ts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Bulk/CSV import now passes theme_id through to DB
- Preferred source update scoped by theme_id (no cross-theme reset)
- Theme creation sends sensible defaults from frontend
- Scheduler wraps generation in 15-minute timeout
- Job store cleanup runs every 5 minutes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Moves JobEntry, JobStore, ProgressEvent, JOB_TTL, and emit_progress
to a dedicated module. Updates imports in synthesis.rs, generation.rs,
scheduler.rs, and app_state.rs. synthesis.rs re-exports for backward
compatibility.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces ~200 duplicated lines in Phase 1 (personalized sources) and
Phase 2 (Brave Search) with a shared scrape_and_classify_batch function.
Uses ScrapeClassifyCtx to bundle shared parameters. Also prepares
synthesis.rs for JobStore extraction by re-exporting from job_store.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
article_history_days=0 disables "used" trace entries, so the test
found 0 entries. Changed to 90 to enable tracing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Spawns a tokio task that checks for due schedules every 60 seconds,
runs generation via run_generation_inner, and sends emails to configured
recipients before marking each schedule as run.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Users can mark sources as preferred via star buttons on the theme page.
Preferred sources are processed first in the pipeline (ordered before
non-preferred in waves, shuffled separately then merged).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds Arc<AtomicBool> cancellation flag to JobStore/JobEntry. The pipeline
checks the flag before each wave and after each batch, then saves whatever
articles have been collected. A new POST /syntheses/generate/:job_id/stop
endpoint sets the flag. The frontend shows a red stop button during generation
and POSTs to the stop endpoint on click.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Diversity filter works across batches (source_counts updated after classify).
With batch_size=5, all 3 articles fit in one batch, bypassing the filter.
batch_size=1 forces per-article processing so the filter triggers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without a date, articles are routed to "Articles sans date" instead
of their classified category, breaking pipeline tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers GAP-01 (themes API), GAP-02 (article history API), and
GAP-04 (assign_category unit tests).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove content settings from settings table (moved to themes).
Add theme_id to sources and syntheses. Pipeline loads content
settings from the selected theme. Generate endpoint requires theme_id.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements GET/POST/PUT/DELETE /api/v1/themes handlers following the same patterns as sources.rs, registers the module, and wires up routes in the router.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>