Articles where neither the scraper nor the LLM could extract a date
are now placed in a separate "Articles sans date" section instead of
their classified category. This makes undated articles visible without
mixing them with properly dated content.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The LLM now determines if scraped content is a real article during
classify (zero extra cost). The separate LLM link extraction option
is removed — heuristic extraction is sufficient.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add published_date column to article_history table
- Add date field to NewsItem (serialized in synthesis JSONB)
- Pass LLM-extracted date through ArticleTrace to article history
- Display date below article title in SynthesisDetail page
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The classify prompt now asks the LLM to return a date field (YYYY-MM-DD).
When the scraper couldn't find a date, the LLM-extracted date is used to
filter articles that exceed max_age_days.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add SKIP_SSRF_CHECK env var to bypass SSRF in test environments
- Use wiremock server as source URL (same domain as article URLs)
- Add source page mock to wiremock setup
- Set SKIP_SSRF_CHECK=1 in integration test script
- Fix unused import warning
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrite run-integration-tests.sh to use the e2e docker-compose config
(Postgres on port 5433). Add --db-check flag for connectivity debugging.
Remove build_test_router (reverted to build_router). Keep minimal_test
for oneshot debugging.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Unauthenticated requests were hanging in integration tests due to
tower middleware layers interacting with oneshot(). Add build_test_router()
that only includes API routes + CSRF middleware.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds an optional LlmProvider override to run_generation and
run_generation_inner, allowing tests to inject a mock provider without
touching real credentials or the provider-resolution path. Makes
run_generation_inner pub so integration tests can call it directly.
Production callers pass None and behaviour is unchanged.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wrap `model_research` (String), `classify_schema` (Value), and
`classification_categories` (Vec<String>) in Arc before the batch
loops so spawned tasks clone a cheap pointer instead of the full
heap data on every iteration. Also removes the redundant intermediate
`mdl`/`class_sys`/`class_user` bindings in both classify loops.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace runtime Selector::parse calls on static strings with module-level
LazyLock statics in source_scraper.rs (ANCHOR_SELECTOR) and scraper.rs
(SEL_TITLE, SEL_H1, SEL_BODY), so each selector is compiled once at
first use instead of on every function call.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SESSION_SECRET was loaded and validated but never used anywhere in the
codebase. master_encryption_key is now wrapped in Arc<String> to avoid
cloning the secret string on every AppState clone.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace iterating DashMap check with atomic DashSet insert in create_job to
eliminate the race condition where double-click could create two concurrent
jobs for the same user
- Add release_user method called at end of generation task (normal, timeout,
and panic paths) so the generating slot is always freed
- Wrap run_generation in tokio::time::timeout(900s) to prevent hung LLM calls
from blocking the generation slot forever
- Spawn a second task to await the JoinHandle and call release_user + send
error event if the generation task panics, preventing SSE clients from
hanging indefinitely
- Update cleanup_expired to also remove users from generating_users set
- Update tests to call release_user after completion/error to match new contract
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Split VALID_PROVIDERS (LLM only) from VALID_API_KEY_PROVIDERS (includes
brave_search) so Brave keys can be stored without allowing brave_search
as an admin LLM provider.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a branch in test_key to route brave_search provider to
crate::services::brave_search::test_api_key instead of the LLM factory.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add use_brave_search boolean field to all settings structs, DB layer,
SQL queries, frontend types, i18n labels, and test fixtures following
the same pattern as use_llm_for_source_links.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds an optional article_url column to llm_call_log so classify_summarize
entries are traceable back to their source article in the LLM Logs UI.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a user-configurable batch_size setting (default 5, range 1-20)
that controls how many articles are processed in parallel during
Phase 1 scrape+classify. Previously hardcoded to 5.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove 10-source cap; all sources are now processed
- Increase max links per source from 10 to 15
- Extract article links in parallel (up to 5 concurrent) using JoinSet
- Shuffle candidate URLs after history filtering to interleave sources
- Add DELETE /api/v1/article-history endpoint to clear all history for a user
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add get_last_source_url() to article_history db module for source rotation
- Remove head_html field from ScrapedContent struct and scrape_url function
- Fix synthesis.rs scrape_single_article_with_llm to pass empty string instead of removed field
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>