ai_synth

Commit Graph

Author	SHA1	Message	Date
oabrivard	4c6381b09a	feat: add batch_size setting for Phase 1 parallelism Add a user-configurable batch_size setting (default 5, range 1-20) that controls how many articles are processed in parallel during Phase 1 scrape+classify. Previously hardcoded to 5. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	14b0a0b7e8	refactor: LLM link extraction uses body only (no head), increased to 12000 chars	3 months ago
oabrivard	8d232c1ade	feat: split model selection — scraping vs websearch with GPT-5 models Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	0b180eb75c	refactor: remove old classification, rewrite, and article extraction prompts/schemas Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	b2dbc3847a	feat: add per-article classify/summarize prompt and schema Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	825b793387	feat: drop source_diversity_window and use_llm_for_article_extraction settings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	b9003cde54	feat: instrument pipeline with article tracing at every filtering step Add source_url field to ScrapedNewsItem and a trace_article helper that inserts into article_history with full provenance metadata. Instrument Phase 1 (empty content, history dedup, source diversity) and Phase 2 (homepage filter, cross-phase dedup, history dedup, empty content) so every dropped article is recorded with its filter reason. Replace the old insert_urls call with per-article trace_article calls for used articles, preserving dedup semantics via url_hash. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	c271c240a2	feat: add article_history table and article_history_days setting Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	e6e8aa1eeb	feat: add LLM prompts and schemas for link and article extraction Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	e483789d1b	feat: add use_llm_for_source_links and use_llm_for_article_extraction settings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	51ea032838	feat: add scrape_flat_urls helper and gap-aware search prompt Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	104b6a0d7b	feat: add classification prompt and schema for article categorization Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	3f6ad9853c	feat: build_search_prompt accepts recent_domains for source diversity Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	a31915d3ce	feat: add source_diversity_window setting (migration + model + DB + validation tests) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	c1ee79bcf6	feat: add max_articles_per_source setting (migration + model + DB) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	45c9e71589	fix: enforce max_items_per_category in JSON schema and prompt The LLM was returning only 1 article per category despite the user setting 4. - Added minItems/maxItems to the category array schema (enforced by OpenAI strict mode) - Changed prompt from "au maximum N actualites" to "exactement N actualites" - Schema builder now takes max_items_per_category parameter Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	9b994e0528	v2: pipeline user model selection, rate limiter, URL filter, original title, null-safe sections - resolve_provider_and_key() now respects user ai_provider preference - Dual model resolution: ai_model for search pass, ai_model_writing for rewrite pass - Per-generation rate limiter with user override support - Homepage URL filter removes domain-only URLs after search pass - ScrapedNewsItem gains original_title field populated from page <title> - SynthesisResponse::try_from handles null sections gracefully (returns empty vec) - Search prompt warns LLM against returning homepage URLs - Rewrite prompt instructs LLM to use originalTitle with language preservation rules Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	ed6b41fe52	v2: add settings migration, model expansion, DB queries (provider, models, rate limits) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	aa6f1ba76b	Phase 5: Generation pipeline with SSE progress, syntheses CRUD Backend: - Full 2-pass generation pipeline: LLM search -> URL scraping -> LLM rewrite - Async generation with tokio::spawn, JobStore with per-user concurrency limit - SSE progress streaming via axum::response::Sse + tokio::sync::watch - Syntheses CRUD: list (paginated), get (ownership check), delete - Prompt construction ported from original geminiService.ts - Parallel URL scraping with bounded concurrency (max 10) - Graceful partial failure handling (some URLs fail -> continue) - 36 new unit tests, 16 integration tests Frontend: - Home dashboard: synthesis card grid, week badges, delete with confirmation - Generate page: SSE-driven progress bar, step checklist, auto-redirect - Synthesis detail: section-by-section display, external links, delete - SSE client helper with auto-reconnect (exponential backoff) - Date utilities with French locale formatting Critical fixes applied: - SSE EventSource now sends credentials (withCredentials: true) - Gemini error logging sanitized to prevent API key leak in logs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago

19 Commits (4c6381b09a0751073845a9d4696cd5061f40db5f)