ai_synth

Commit Graph

Author	SHA1	Message	Date
oabrivard	cbe1cd6507	feat: LLM logs types, API client, and i18n labels Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	dafec2591b	feat: API endpoint for LLM call logs by job_id Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	9fffde8312	feat: log LLM calls with timing at search, classification, and rewrite steps Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	b2b0b286c0	feat: create llm_call_log table + DB module Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	88c16c5d67	docs: add LLM call logging implementation plan (6 tasks)	3 months ago
oabrivard	314fb7a037	docs: add spec for LLM call logging per synthesis	3 months ago
oabrivard	f7428191ec	test: verify provenance endpoint returns tracing data after generation	3 months ago
oabrivard	6fc6fff1f3	feat: article history page + provenance section in synthesis detail - Add ArticleHistoryEntry/ArticleHistoryResponse types - Add articleHistoryApi client (list + getProvenance endpoints) - Add ArticleHistory page with status/source_type filters and pagination - Add collapsible provenance section to SynthesisDetail - Register /article-history route in App.tsx - Add viewHistory link in Settings near articleHistoryDays input - Add all French i18n strings for article history feature Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	55fe828e58	feat: API endpoints for article history listing and provenance Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	b9003cde54	feat: instrument pipeline with article tracing at every filtering step Add source_url field to ScrapedNewsItem and a trace_article helper that inserts into article_history with full provenance metadata. Instrument Phase 1 (empty content, history dedup, source diversity) and Phase 2 (homepage filter, cross-phase dedup, history dedup, empty content) so every dropped article is recorded with its filter reason. Replace the old insert_urls call with per-article trace_article calls for used articles, preserving dedup semantics via url_hash. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	0e2c69edf7	feat: save job_id on syntheses for provenance lookup Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	eba721266f	feat: article history entry struct + insert/query/cleanup functions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	d7afd08eaf	feat: enrich article_history with tracing metadata + syntheses.job_id	3 months ago
oabrivard	5a0495b02a	docs: add article tracing implementation plan (7 tasks)	3 months ago
oabrivard	445dad9963	docs: add spec for article tracing — enriched history with provenance views	3 months ago
oabrivard	7cbb2853ce	feat: Autre fill-up to 75% synthesis target with source diversity enforcement Accumulates overflow articles from both classification phases and redistributes them into the Autre category when total articles fall below 75% of the configured max, respecting per-source diversity limits. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	c3e6103ef1	feat: parse_classification_response collects overflow articles Returns a (result, overflow) tuple so callers can access articles that could not fit in any category or Autre. Also adds the SYNTHESIS_MIN_FILL_RATIO constant for the upcoming fill-up logic. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	f5f0656604	docs: add Autre fill-up implementation plan	3 months ago
oabrivard	aebd436a91	docs: add spec for Autre fill-up to 75% synthesis target	3 months ago
oabrivard	cea723f7d7	test: update E2E and integration tests with article_history_days setting	3 months ago
oabrivard	708a641223	feat: add article_history_days setting to frontend Add article_history_days (defaulting to 90) to UserSettings interface and DEFAULT_SETTINGS, French translation, and Settings page number input. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	65eb6004d2	feat: article history filtering in pipeline — cleanup, Phase 1/2 filter, retry, insert Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	0a87b7ed8f	feat: add normalize_article_url and hash_article_url utilities Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	5a928aa990	feat: add article_history DB module (check, insert, cleanup) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	c271c240a2	feat: add article_history table and article_history_days setting Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	d7c91c956f	docs: add article history implementation plan with retry logic	3 months ago
oabrivard	633a51dc8c	docs: add spec for article history to prevent cross-synthesis duplicates	3 months ago
oabrivard	8e06357b47	test: update integration test with LLM scraping settings	3 months ago
oabrivard	a7599e512a	test: comprehensive E2E synthesis validation (duplicates, links, counts, domains) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	c8779f6ca2	feat: add LLM scraping toggles to Settings page Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	8a061c98db	feat: LLM-assisted article extraction with Arc provider, concurrency control, and progress Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	357f06e405	feat: LLM-assisted source link extraction with heuristic fallback Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	e6e8aa1eeb	feat: add LLM prompts and schemas for link and article extraction Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	23f121a58d	feat: ScrapedContent url+head_html fields, Arc<dyn LlmProvider>, 3-tuple scrape returns Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	e483789d1b	feat: add use_llm_for_source_links and use_llm_for_article_extraction settings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	d508ea9620	docs: revise LLM scraping plan — fix Arc provider, head_html, concurrency, tests	3 months ago
oabrivard	175483dfe3	docs: add spec and plan for LLM-assisted scraping Two optional LLM enhancements: link extraction from source pages and article content extraction. Plan needs revision for Arc<dyn LlmProvider> threading and <head> HTML preservation before implementation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	420603d76a	Updated specifications of source diversity functionality	3 months ago
oabrivard	53ecce84b0	feat: two-phase generation pipeline — personalized sources first, web search fallback Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	51ea032838	feat: add scrape_flat_urls helper and gap-aware search prompt Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	d508b5b4ab	feat: Autre category support in rewrite schema, final sections, URL restore + remove dead code Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	ba7024e280	feat: add classification response parsing with category filling and Autre fallback Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	104b6a0d7b	feat: add classification prompt and schema for article categorization Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	c06b5ba454	feat: add source_scraper module for extracting article links from source pages Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	7c8a19616d	docs: add spec and plan for source priority pipeline redesign Two-phase pipeline: scrape personalized sources first, classify with LLM, fall back to web search for gaps. Tasks 1-3 ready, Task 4 needs elaboration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	45e5ee8a7d	fix: rewrite pass schema uses actual scraped item counts, not max setting The rewrite pass shared the search pass schema which enforced minItems/maxItems equal to max_items_per_category. After filter_empty_scraped_articles removed old/failed articles, the scraped data had fewer items than the schema required, causing the LLM to duplicate content to fill the quota. Now build_rewrite_schema counts actual items per category from the scraped data and sets minItems/maxItems accordingly. Also removed dead domain_counts variable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	13894a8f50	fix: filter empty scraped articles + restore URLs after rewrite + E2E assertions - filter_empty_scraped_articles: removes articles with empty scraped content (too old, soft 404, scrape failure) before the rewrite pass, preventing empty articles in the final synthesis - restore_scraped_urls: already existed, now has unit tests - E2E test: added assertions for no Wikipedia URLs, no empty summaries, and updated settings payload with new fields (max_articles_per_source, source_diversity_window) - 4 new unit tests for filter_empty + restore_scraped_urls Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	a9be1ce435	fix: restore scraped URLs after LLM rewrite pass to prevent hallucination The rewrite pass can replace validated URLs with hallucinated ones (Wikipedia, corporate sites) despite being instructed to preserve them. After the rewrite, restore_scraped_urls() replaces each article's URL with the original scraped URL by matching on position (category + item index). Logs when a URL is restored so hallucination patterns can be monitored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	8a18b70aff	fix: set max output tokens to 16384 for all LLM providers OpenAI's default output limit (4096 tokens) was too low for structured synthesis output with multiple categories and articles per category, causing truncated JSON. Set 16384 for both OpenAI APIs (Responses + Chat Completions) and Gemini. Anthropic already had 16384. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	fdb3110407	feat: add source_diversity_window setting to frontend Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago

... 2 3 4 5 6

269 Commits (9ee372aef3d11ab04fe535a84ff701acddf1ada3) All Branches Search

269 Commits (9ee372aef3d11ab04fe535a84ff701acddf1ada3)

All Branches