Adds detect_and_parse_feed which orchestrates feed caching/freshness logic:
uses cached feed URL directly if fresh (< 30 days), otherwise re-discovers
from source URL via discover_feed. Returns FeedResult::Found or NotFound.
Includes 4 new tests covering fresh cache, no cache, no feed, and stale cache cases.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add nullable rss_url (TEXT) and rss_discovered_at (TIMESTAMPTZ) columns
to the sources table for RSS feed URL caching. Update the Source struct,
all query_as SELECT/RETURNING queries, and add update_source_rss db function.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Turnstile tokens are single-use. The resend flow reused the consumed token,
causing "timeout-or-duplicate" errors from Cloudflare.
Frontend:
- Add Turnstile widget to resend view on Login and Register pages
- Add resetSignal prop to Turnstile component to re-solve after each resend
- Clear token after each successful API call, guard resend against null token
- Add test for resetSignal behavior
Backend:
- Add entry log when magic link email sending begins
- Add explicit warning when rate limit prevents sending
- Add error log with rollback context when email delivery fails
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Many modern sites (Hugo, WordPress, Next.js) load articles via JavaScript
but include full article URLs in JSON-LD schema.org markup in the <head>.
The scraper now extracts these first (highest quality), then falls back
to <a href> heuristic extraction. Supports ItemList, BlogPosting,
NewsArticle, @graph arrays, and mainEntity wrappers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Date parser now supports: 25/03/2026, 25-03-2026, March 25 2026,
25 mars 2026, 15 février 2026, and short month variants.
Articles without dates are no longer routed to a separate category —
they stay in their LLM-classified category with date shown as empty.
This prevents losing good articles in a catch-all section.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Bulk/CSV import now passes theme_id through to DB
- Preferred source update scoped by theme_id (no cross-theme reset)
- Theme creation sends sensible defaults from frontend
- Scheduler wraps generation in 15-minute timeout
- Job store cleanup runs every 5 minutes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Moves JobEntry, JobStore, ProgressEvent, JOB_TTL, and emit_progress
to a dedicated module. Updates imports in synthesis.rs, generation.rs,
scheduler.rs, and app_state.rs. synthesis.rs re-exports for backward
compatibility.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces ~200 duplicated lines in Phase 1 (personalized sources) and
Phase 2 (Brave Search) with a shared scrape_and_classify_batch function.
Uses ScrapeClassifyCtx to bundle shared parameters. Also prepares
synthesis.rs for JobStore extraction by re-exporting from job_store.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Spawns a tokio task that checks for due schedules every 60 seconds,
runs generation via run_generation_inner, and sends emails to configured
recipients before marking each schedule as run.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Users can mark sources as preferred via star buttons on the theme page.
Preferred sources are processed first in the pipeline (ordered before
non-preferred in waves, shuffled separately then merged).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds Arc<AtomicBool> cancellation flag to JobStore/JobEntry. The pipeline
checks the flag before each wave and after each batch, then saves whatever
articles have been collected. A new POST /syntheses/generate/:job_id/stop
endpoint sets the flag. The frontend shows a red stop button during generation
and POSTs to the stop endpoint on click.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without a date, articles are routed to "Articles sans date" instead
of their classified category, breaking pipeline tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers GAP-01 (themes API), GAP-02 (article history API), and
GAP-04 (assign_category unit tests).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove content settings from settings table (moved to themes).
Add theme_id to sources and syntheses. Pipeline loads content
settings from the selected theme. Generate endpoint requires theme_id.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Articles where neither the scraper nor the LLM could extract a date
are now placed in a separate "Articles sans date" section instead of
their classified category. This makes undated articles visible without
mixing them with properly dated content.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The LLM now determines if scraped content is a real article during
classify (zero extra cost). The separate LLM link extraction option
is removed — heuristic extraction is sufficient.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add published_date column to article_history table
- Add date field to NewsItem (serialized in synthesis JSONB)
- Pass LLM-extracted date through ArticleTrace to article history
- Display date below article title in SynthesisDetail page
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The classify prompt now asks the LLM to return a date field (YYYY-MM-DD).
When the scraper couldn't find a date, the LLM-extracted date is used to
filter articles that exceed max_age_days.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add SKIP_SSRF_CHECK env var to bypass SSRF in test environments
- Use wiremock server as source URL (same domain as article URLs)
- Add source page mock to wiremock setup
- Set SKIP_SSRF_CHECK=1 in integration test script
- Fix unused import warning
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds an optional LlmProvider override to run_generation and
run_generation_inner, allowing tests to inject a mock provider without
touching real credentials or the provider-resolution path. Makes
run_generation_inner pub so integration tests can call it directly.
Production callers pass None and behaviour is unchanged.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wrap `model_research` (String), `classify_schema` (Value), and
`classification_categories` (Vec<String>) in Arc before the batch
loops so spawned tasks clone a cheap pointer instead of the full
heap data on every iteration. Also removes the redundant intermediate
`mdl`/`class_sys`/`class_user` bindings in both classify loops.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace runtime Selector::parse calls on static strings with module-level
LazyLock statics in source_scraper.rs (ANCHOR_SELECTOR) and scraper.rs
(SEL_TITLE, SEL_H1, SEL_BODY), so each selector is compiled once at
first use instead of on every function call.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace iterating DashMap check with atomic DashSet insert in create_job to
eliminate the race condition where double-click could create two concurrent
jobs for the same user
- Add release_user method called at end of generation task (normal, timeout,
and panic paths) so the generating slot is always freed
- Wrap run_generation in tokio::time::timeout(900s) to prevent hung LLM calls
from blocking the generation slot forever
- Spawn a second task to await the JoinHandle and call release_user + send
error event if the generation task panics, preventing SSE clients from
hanging indefinitely
- Update cleanup_expired to also remove users from generating_users set
- Update tests to call release_user after completion/error to match new contract
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add use_brave_search boolean field to all settings structs, DB layer,
SQL queries, frontend types, i18n labels, and test fixtures following
the same pattern as use_llm_for_source_links.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds an optional article_url column to llm_call_log so classify_summarize
entries are traceable back to their source article in the LLM Logs UI.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>