The LLM now determines if scraped content is a real article during
classify (zero extra cost). The separate LLM link extraction option
is removed — heuristic extraction is sufficient.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The classify prompt now asks the LLM to return a date field (YYYY-MM-DD).
When the scraper couldn't find a date, the LLM-extracted date is used to
filter articles that exceed max_age_days.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add use_brave_search boolean field to all settings structs, DB layer,
SQL queries, frontend types, i18n labels, and test fixtures following
the same pattern as use_llm_for_source_links.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a user-configurable batch_size setting (default 5, range 1-20)
that controls how many articles are processed in parallel during
Phase 1 scrape+classify. Previously hardcoded to 5.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add source_url field to ScrapedNewsItem and a trace_article helper that
inserts into article_history with full provenance metadata. Instrument
Phase 1 (empty content, history dedup, source diversity) and Phase 2
(homepage filter, cross-phase dedup, history dedup, empty content) so
every dropped article is recorded with its filter reason. Replace the
old insert_urls call with per-article trace_article calls for used
articles, preserving dedup semantics via url_hash.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The LLM was returning only 1 article per category despite the user setting 4.
- Added minItems/maxItems to the category array schema (enforced by OpenAI strict mode)
- Changed prompt from "au maximum N actualites" to "exactement N actualites"
- Schema builder now takes max_items_per_category parameter
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- resolve_provider_and_key() now respects user ai_provider preference
- Dual model resolution: ai_model for search pass, ai_model_writing for rewrite pass
- Per-generation rate limiter with user override support
- Homepage URL filter removes domain-only URLs after search pass
- ScrapedNewsItem gains original_title field populated from page <title>
- SynthesisResponse::try_from handles null sections gracefully (returns empty vec)
- Search prompt warns LLM against returning homepage URLs
- Rewrite prompt instructs LLM to use originalTitle with language preservation rules
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>