- Add ArticleHistoryEntry/ArticleHistoryResponse types
- Add articleHistoryApi client (list + getProvenance endpoints)
- Add ArticleHistory page with status/source_type filters and pagination
- Add collapsible provenance section to SynthesisDetail
- Register /article-history route in App.tsx
- Add viewHistory link in Settings near articleHistoryDays input
- Add all French i18n strings for article history feature
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add source_url field to ScrapedNewsItem and a trace_article helper that
inserts into article_history with full provenance metadata. Instrument
Phase 1 (empty content, history dedup, source diversity) and Phase 2
(homepage filter, cross-phase dedup, history dedup, empty content) so
every dropped article is recorded with its filter reason. Replace the
old insert_urls call with per-article trace_article calls for used
articles, preserving dedup semantics via url_hash.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Accumulates overflow articles from both classification phases and redistributes
them into the Autre category when total articles fall below 75% of the configured
max, respecting per-source diversity limits.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Returns a (result, overflow) tuple so callers can access articles that
could not fit in any category or Autre. Also adds the
SYNTHESIS_MIN_FILL_RATIO constant for the upcoming fill-up logic.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add article_history_days (defaulting to 90) to UserSettings interface and DEFAULT_SETTINGS, French translation, and Settings page number input.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two optional LLM enhancements: link extraction from source pages and article
content extraction. Plan needs revision for Arc<dyn LlmProvider> threading
and <head> HTML preservation before implementation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two-phase pipeline: scrape personalized sources first, classify with LLM,
fall back to web search for gaps. Tasks 1-3 ready, Task 4 needs elaboration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The rewrite pass shared the search pass schema which enforced minItems/maxItems
equal to max_items_per_category. After filter_empty_scraped_articles removed
old/failed articles, the scraped data had fewer items than the schema required,
causing the LLM to duplicate content to fill the quota.
Now build_rewrite_schema counts actual items per category from the scraped data
and sets minItems/maxItems accordingly. Also removed dead domain_counts variable.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- filter_empty_scraped_articles: removes articles with empty scraped content
(too old, soft 404, scrape failure) before the rewrite pass, preventing
empty articles in the final synthesis
- restore_scraped_urls: already existed, now has unit tests
- E2E test: added assertions for no Wikipedia URLs, no empty summaries,
and updated settings payload with new fields (max_articles_per_source,
source_diversity_window)
- 4 new unit tests for filter_empty + restore_scraped_urls
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The rewrite pass can replace validated URLs with hallucinated ones (Wikipedia,
corporate sites) despite being instructed to preserve them. After the rewrite,
restore_scraped_urls() replaces each article's URL with the original scraped
URL by matching on position (category + item index). Logs when a URL is
restored so hallucination patterns can be monitored.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OpenAI's default output limit (4096 tokens) was too low for structured
synthesis output with multiple categories and articles per category,
causing truncated JSON. Set 16384 for both OpenAI APIs (Responses +
Chat Completions) and Gemini. Anthropic already had 16384.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>