ai_synth

Commit Graph

Author	SHA1	Message	Date
oabrivard	a89c61c5b6	feat: add "Articles sans date" category for articles without publication date Articles where neither the scraper nor the LLM could extract a date are now placed in a separate "Articles sans date" section instead of their classified category. This makes undated articles visible without mixing them with properly dated content. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	fb086a706f	feat: rename fallback category "Autre" to "Divers" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	e24236a069	feat: add max_links_per_source setting (default 8, was hardcoded 15) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	2c3c6008a3	fix: monotonic progress bar with 3 clean phases (sources, websearch, saving) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	d234fa9b24	feat: add is_article LLM check + remove use_llm_for_source_links option The LLM now determines if scraped content is a real article during classify (zero extra cost). The separate LLM link extraction option is removed — heuristic extraction is sufficient. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	37d17e577a	feat: restructure Phase 1 into windowed source extraction waves Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	0f1b0306e4	feat: add source_extraction_window setting (default 3) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	c5a56c8fb8	feat: save publication date in article history and show in synthesis - Add published_date column to article_history table - Add date field to NewsItem (serialized in synthesis JSONB) - Pass LLM-extracted date through ArticleTrace to article history - Display date below article title in SynthesisDetail page Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	de25a08d51	feat: LLM extracts publication date as fallback for article age filtering The classify prompt now asks the LLM to return a date field (YYYY-MM-DD). When the scraper couldn't find a date, the LLM-extracted date is used to filter articles that exceed max_age_days. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	91272ddfc4	feat: dynamic summary length and body snippet size based on setting Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	1b63afd12a	feat: add summary_length setting (1=court, 2=moyen, 3=detaille) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	0874650a7f	fix: pipeline tests use wiremock URLs + skip SSRF for localhost - Add SKIP_SSRF_CHECK env var to bypass SSRF in test environments - Use wiremock server as source URL (same domain as article URLs) - Add source page mock to wiremock setup - Set SKIP_SSRF_CHECK=1 in integration test script - Fix unused import warning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	aee70b37d4	fix: use docker-compose.test.yml for integration test DB Rewrite run-integration-tests.sh to use the e2e docker-compose config (Postgres on port 5433). Add --db-check flag for connectivity debugging. Remove build_test_router (reverted to build_router). Keep minimal_test for oneshot debugging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	53813007c6	fix: use lightweight test router without SPA fallback and TraceLayer Unauthenticated requests were hanging in integration tests due to tower middleware layers interacting with oneshot(). Add build_test_router() that only includes API routes + CSRF middleware. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	ccecaa2d13	refactor: add provider_override for pipeline dependency injection Adds an optional LlmProvider override to run_generation and run_generation_inner, allowing tests to inject a mock provider without touching real credentials or the provider-resolution path. Makes run_generation_inner pub so integration tests can call it directly. Production callers pass None and behaviour is unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	17e054c257	feat: add MockLlmProvider for integration testing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	4bbdd5c4d1	perf: batch article history INSERTs to reduce DB round-trips Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	f37e0b42a0	perf: use Arc for immutable values in pipeline to reduce cloning Wrap `model_research` (String), `classify_schema` (Value), and `classification_categories` (Vec<String>) in Arc before the batch loops so spawned tasks clone a cheap pointer instead of the full heap data on every iteration. Also removes the redundant intermediate `mdl`/`class_sys`/`class_user` bindings in both classify loops. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	60494aeceb	perf: cache CSS selectors with LazyLock to avoid re-parsing Replace runtime Selector::parse calls on static strings with module-level LazyLock statics in source_scraper.rs (ANCHOR_SELECTOR) and scraper.rs (SEL_TITLE, SEL_H1, SEL_BODY), so each selector is compiled once at first use instead of on every function call. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	69c1688bc7	chore: remove SESSION_SECRET and wrap master_encryption_key in Arc SESSION_SECRET was loaded and validated but never used anywhere in the codebase. master_encryption_key is now wrapped in Arc<String> to avoid cloning the secret string on every AppState clone. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	f44aa44c48	refactor: replace trace_article 11 parameters with ArticleTrace struct Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	f5466a6bd5	refactor: extract shared LLM error mapping to reduce duplication Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	2036c12b24	refactor: eliminate SettingsResponse struct, serialize UserSettings directly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	e056ef9d3e	refactor: extract assign_category and filter_phase2_url helpers from synthesis pipeline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	24d53a01d1	fix: block SSRF via IPv4-mapped IPv6 and add check to source page fetching Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	93003229f1	fix: add periodic expired session cleanup (hourly) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	347558a278	fix: atomic job creation, 15min timeout, and panic handling - Replace iterating DashMap check with atomic DashSet insert in create_job to eliminate the race condition where double-click could create two concurrent jobs for the same user - Add release_user method called at end of generation task (normal, timeout, and panic paths) so the generating slot is always freed - Wrap run_generation in tokio::time::timeout(900s) to prevent hung LLM calls from blocking the generation slot forever - Spawn a second task to await the JoinHandle and call release_user + send error event if the generation task panics, preventing SSE clients from hanging indefinitely - Update cleanup_expired to also remove users from generating_users set - Update tests to call release_user after completion/error to match new contract Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	59932589cc	fix: prevent UTF-8 panic in error message truncation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	e74a1850bf	fix: log source URL in link_extraction LLM call logs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	a968fdc308	fix: allow brave_search as valid API key provider Split VALID_PROVIDERS (LLM only) from VALID_API_KEY_PROVIDERS (includes brave_search) so Brave keys can be stored without allowing brave_search as an admin LLM provider. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	f124b056fe	feat: add Brave Search Phase 2 pipeline path Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	e05c2ae75a	feat: handle brave_search in API key test endpoint Add a branch in test_key to route brave_search provider to crate::services::brave_search::test_api_key instead of the LLM factory. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	f414ff0f58	feat: add use_brave_search setting Add use_brave_search boolean field to all settings structs, DB layer, SQL queries, frontend types, i18n labels, and test fixtures following the same pattern as use_llm_for_source_links. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	fa03c60339	feat: add Brave Search API client module Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	41109b3d93	feat: send structured link pairs to LLM instead of raw HTML body Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	a5332f0996	feat: add article_url to LLM call logs for classify tracing Adds an optional article_url column to llm_call_log so classify_summarize entries are traceable back to their source article in the LLM Logs UI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	b062e81218	fix: remove personalized sources from Phase 2 web search prompt Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	4c6381b09a	feat: add batch_size setting for Phase 1 parallelism Add a user-configurable batch_size setting (default 5, range 1-20) that controls how many articles are processed in parallel during Phase 1 scrape+classify. Previously hardcoded to 5. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	7cd867c650	fix: resolve all clippy warnings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	fa9375233e	fix: remove 3 compiler warnings (unreachable code, unused variables)	3 months ago
oabrivard	14b0a0b7e8	refactor: LLM link extraction uses body only (no head), increased to 12000 chars	3 months ago
oabrivard	3353e5261f	feat: rate limiter waits instead of failing — sleeps until window passes (max 60s)	3 months ago
oabrivard	ed399e9a6e	feat: parallelize Phase 1 scrape+classify in batches of 5	3 months ago
oabrivard	a5f4239157	fix: distinguish filtered_too_old from filtered_empty in article tracing	3 months ago
oabrivard	a760220d44	fix: log LLM calls for source link extraction in llm_call_log	3 months ago
oabrivard	8d232c1ade	feat: split model selection — scraping vs websearch with GPT-5 models Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	7f7d584314	feat: parallel source extraction, shuffle candidates, clear history endpoint - Remove 10-source cap; all sources are now processed - Increase max links per source from 10 to 15 - Extract article links in parallel (up to 5 concurrent) using JoinSet - Shuffle candidate URLs after history filtering to interleave sources - Add DELETE /api/v1/article-history endpoint to clear all history for a user Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	7a8427316c	feat: rewrite synthesis pipeline — per-article classify/summarize, no rewrite pass Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	0b180eb75c	refactor: remove old classification, rewrite, and article extraction prompts/schemas Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	bb716b5dc2	feat: add get_last_source_url + remove head_html from ScrapedContent - Add get_last_source_url() to article_history db module for source rotation - Remove head_html field from ScrapedContent struct and scrape_url function - Fix synthesis.rs scrape_single_article_with_llm to pass empty string instead of removed field Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago

1 2 3

116 Commits (03f26601638b25a7d0851dbbd8e09ae6bdf5d117)