ai_synth

Commit Graph

Author	SHA1	Message	Date
oabrivard	fb086a706f	feat: rename fallback category "Autre" to "Divers" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	e24236a069	feat: add max_links_per_source setting (default 8, was hardcoded 15) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	d234fa9b24	feat: add is_article LLM check + remove use_llm_for_source_links option The LLM now determines if scraped content is a real article during classify (zero extra cost). The separate LLM link extraction option is removed — heuristic extraction is sufficient. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	0f1b0306e4	feat: add source_extraction_window setting (default 3) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	de25a08d51	feat: LLM extracts publication date as fallback for article age filtering The classify prompt now asks the LLM to return a date field (YYYY-MM-DD). When the scraper couldn't find a date, the LLM-extracted date is used to filter articles that exceed max_age_days. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	91272ddfc4	feat: dynamic summary length and body snippet size based on setting Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	1b63afd12a	feat: add summary_length setting (1=court, 2=moyen, 3=detaille) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	f414ff0f58	feat: add use_brave_search setting Add use_brave_search boolean field to all settings structs, DB layer, SQL queries, frontend types, i18n labels, and test fixtures following the same pattern as use_llm_for_source_links. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	41109b3d93	feat: send structured link pairs to LLM instead of raw HTML body Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	4c6381b09a	feat: add batch_size setting for Phase 1 parallelism Add a user-configurable batch_size setting (default 5, range 1-20) that controls how many articles are processed in parallel during Phase 1 scrape+classify. Previously hardcoded to 5. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	14b0a0b7e8	refactor: LLM link extraction uses body only (no head), increased to 12000 chars	3 months ago
oabrivard	8d232c1ade	feat: split model selection — scraping vs websearch with GPT-5 models Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	0b180eb75c	refactor: remove old classification, rewrite, and article extraction prompts/schemas Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	b2dbc3847a	feat: add per-article classify/summarize prompt and schema Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	825b793387	feat: drop source_diversity_window and use_llm_for_article_extraction settings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	b9003cde54	feat: instrument pipeline with article tracing at every filtering step Add source_url field to ScrapedNewsItem and a trace_article helper that inserts into article_history with full provenance metadata. Instrument Phase 1 (empty content, history dedup, source diversity) and Phase 2 (homepage filter, cross-phase dedup, history dedup, empty content) so every dropped article is recorded with its filter reason. Replace the old insert_urls call with per-article trace_article calls for used articles, preserving dedup semantics via url_hash. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	c271c240a2	feat: add article_history table and article_history_days setting Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	e6e8aa1eeb	feat: add LLM prompts and schemas for link and article extraction Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	e483789d1b	feat: add use_llm_for_source_links and use_llm_for_article_extraction settings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	51ea032838	feat: add scrape_flat_urls helper and gap-aware search prompt Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	104b6a0d7b	feat: add classification prompt and schema for article categorization Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	3f6ad9853c	feat: build_search_prompt accepts recent_domains for source diversity Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	a31915d3ce	feat: add source_diversity_window setting (migration + model + DB + validation tests) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	c1ee79bcf6	feat: add max_articles_per_source setting (migration + model + DB) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	45c9e71589	fix: enforce max_items_per_category in JSON schema and prompt The LLM was returning only 1 article per category despite the user setting 4. - Added minItems/maxItems to the category array schema (enforced by OpenAI strict mode) - Changed prompt from "au maximum N actualites" to "exactement N actualites" - Schema builder now takes max_items_per_category parameter Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	9b994e0528	v2: pipeline user model selection, rate limiter, URL filter, original title, null-safe sections - resolve_provider_and_key() now respects user ai_provider preference - Dual model resolution: ai_model for search pass, ai_model_writing for rewrite pass - Per-generation rate limiter with user override support - Homepage URL filter removes domain-only URLs after search pass - ScrapedNewsItem gains original_title field populated from page <title> - SynthesisResponse::try_from handles null sections gracefully (returns empty vec) - Search prompt warns LLM against returning homepage URLs - Rewrite prompt instructs LLM to use originalTitle with language preservation rules Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	ed6b41fe52	v2: add settings migration, model expansion, DB queries (provider, models, rate limits) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	aa6f1ba76b	Phase 5: Generation pipeline with SSE progress, syntheses CRUD Backend: - Full 2-pass generation pipeline: LLM search -> URL scraping -> LLM rewrite - Async generation with tokio::spawn, JobStore with per-user concurrency limit - SSE progress streaming via axum::response::Sse + tokio::sync::watch - Syntheses CRUD: list (paginated), get (ownership check), delete - Prompt construction ported from original geminiService.ts - Parallel URL scraping with bounded concurrency (max 10) - Graceful partial failure handling (some URLs fail -> continue) - 36 new unit tests, 16 integration tests Frontend: - Home dashboard: synthesis card grid, week badges, delete with confirmation - Generate page: SSE-driven progress bar, step checklist, auto-redirect - Synthesis detail: section-by-section display, external links, delete - SSE client helper with auto-reconnect (exponential backoff) - Date utilities with French locale formatting Critical fixes applied: - SSE EventSource now sends credentials (withCredentials: true) - Gemini error logging sanitized to prevent API key leak in logs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago

28 Commits (fb086a706f97b57f03ffa07ea686b941593b2985)