91 Commits (114f2912dd34a1117584aae040f91bb716837fc5)

Author SHA1 Message Date
oabrivard e74a1850bf fix: log source URL in link_extraction LLM call logs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard a968fdc308 fix: allow brave_search as valid API key provider
Split VALID_PROVIDERS (LLM only) from VALID_API_KEY_PROVIDERS (includes
brave_search) so Brave keys can be stored without allowing brave_search
as an admin LLM provider.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard f124b056fe feat: add Brave Search Phase 2 pipeline path
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard e05c2ae75a feat: handle brave_search in API key test endpoint
Add a branch in test_key to route brave_search provider to
crate::services::brave_search::test_api_key instead of the LLM factory.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard f414ff0f58 feat: add use_brave_search setting
Add use_brave_search boolean field to all settings structs, DB layer,
SQL queries, frontend types, i18n labels, and test fixtures following
the same pattern as use_llm_for_source_links.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard fa03c60339 feat: add Brave Search API client module
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard 41109b3d93 feat: send structured link pairs to LLM instead of raw HTML body
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard a5332f0996 feat: add article_url to LLM call logs for classify tracing
Adds an optional article_url column to llm_call_log so classify_summarize
entries are traceable back to their source article in the LLM Logs UI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard b062e81218 fix: remove personalized sources from Phase 2 web search prompt
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard 4c6381b09a feat: add batch_size setting for Phase 1 parallelism
Add a user-configurable batch_size setting (default 5, range 1-20)
that controls how many articles are processed in parallel during
Phase 1 scrape+classify. Previously hardcoded to 5.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 7cd867c650 fix: resolve all clippy warnings
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard fa9375233e fix: remove 3 compiler warnings (unreachable code, unused variables) 3 months ago
oabrivard 14b0a0b7e8 refactor: LLM link extraction uses body only (no head), increased to 12000 chars 3 months ago
oabrivard 3353e5261f feat: rate limiter waits instead of failing — sleeps until window passes (max 60s) 3 months ago
oabrivard ed399e9a6e feat: parallelize Phase 1 scrape+classify in batches of 5 3 months ago
oabrivard a5f4239157 fix: distinguish filtered_too_old from filtered_empty in article tracing 3 months ago
oabrivard a760220d44 fix: log LLM calls for source link extraction in llm_call_log 3 months ago
oabrivard 8d232c1ade feat: split model selection — scraping vs websearch with GPT-5 models
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 7f7d584314 feat: parallel source extraction, shuffle candidates, clear history endpoint
- Remove 10-source cap; all sources are now processed
- Increase max links per source from 10 to 15
- Extract article links in parallel (up to 5 concurrent) using JoinSet
- Shuffle candidate URLs after history filtering to interleave sources
- Add DELETE /api/v1/article-history endpoint to clear all history for a user

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard 7a8427316c feat: rewrite synthesis pipeline — per-article classify/summarize, no rewrite pass
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 0b180eb75c refactor: remove old classification, rewrite, and article extraction prompts/schemas
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard bb716b5dc2 feat: add get_last_source_url + remove head_html from ScrapedContent
- Add get_last_source_url() to article_history db module for source rotation
- Remove head_html field from ScrapedContent struct and scrape_url function
- Fix synthesis.rs scrape_single_article_with_llm to pass empty string instead of removed field

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard b2dbc3847a feat: add per-article classify/summarize prompt and schema
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard 825b793387 feat: drop source_diversity_window and use_llm_for_article_extraction settings
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard a2fe3f3310 feat: simplify LlmProvider trait to single call_llm method
Replace the three-method LlmProvider trait (generate_search_pass,
generate_rewrite_pass, supports_web_search) and ProviderCapabilities
with a single call_llm method. Update all three provider implementations
(Gemini, OpenAI, Anthropic) and all callers in synthesis.rs,
source_scraper.rs, and api_keys.rs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard f9023cff7e feat: LLM logs viewer page + log button on Home synthesis list
- Add LlmLogs page with collapsible prompts/response sections, call-type
  colored badges, and duration display
- Wire /llm-logs/:jobId route in App.tsx (lazy-loaded)
- Expose job_id in backend SynthesisListItem and frontend SynthesisListItem
  type; update test fixture accordingly
- Add log-icon link next to delete button on each Home synthesis card

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard dafec2591b feat: API endpoint for LLM call logs by job_id
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard 9fffde8312 feat: log LLM calls with timing at search, classification, and rewrite steps
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard b2b0b286c0 feat: create llm_call_log table + DB module
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard 55fe828e58 feat: API endpoints for article history listing and provenance
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard b9003cde54 feat: instrument pipeline with article tracing at every filtering step
Add source_url field to ScrapedNewsItem and a trace_article helper that
inserts into article_history with full provenance metadata.  Instrument
Phase 1 (empty content, history dedup, source diversity) and Phase 2
(homepage filter, cross-phase dedup, history dedup, empty content) so
every dropped article is recorded with its filter reason.  Replace the
old insert_urls call with per-article trace_article calls for used
articles, preserving dedup semantics via url_hash.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 0e2c69edf7 feat: save job_id on syntheses for provenance lookup
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard eba721266f feat: article history entry struct + insert/query/cleanup functions
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard d7afd08eaf feat: enrich article_history with tracing metadata + syntheses.job_id 3 months ago
oabrivard 7cbb2853ce feat: Autre fill-up to 75% synthesis target with source diversity enforcement
Accumulates overflow articles from both classification phases and redistributes
them into the Autre category when total articles fall below 75% of the configured
max, respecting per-source diversity limits.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard c3e6103ef1 feat: parse_classification_response collects overflow articles
Returns a (result, overflow) tuple so callers can access articles that
could not fit in any category or Autre. Also adds the
SYNTHESIS_MIN_FILL_RATIO constant for the upcoming fill-up logic.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard cea723f7d7 test: update E2E and integration tests with article_history_days setting 3 months ago
oabrivard 65eb6004d2 feat: article history filtering in pipeline — cleanup, Phase 1/2 filter, retry, insert
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 0a87b7ed8f feat: add normalize_article_url and hash_article_url utilities
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard 5a928aa990 feat: add article_history DB module (check, insert, cleanup)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard c271c240a2 feat: add article_history table and article_history_days setting
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 8e06357b47 test: update integration test with LLM scraping settings 3 months ago
oabrivard 8a061c98db feat: LLM-assisted article extraction with Arc provider, concurrency control, and progress
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 357f06e405 feat: LLM-assisted source link extraction with heuristic fallback
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard e6e8aa1eeb feat: add LLM prompts and schemas for link and article extraction
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard 23f121a58d feat: ScrapedContent url+head_html fields, Arc<dyn LlmProvider>, 3-tuple scrape returns
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard e483789d1b feat: add use_llm_for_source_links and use_llm_for_article_extraction settings
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard 53ecce84b0 feat: two-phase generation pipeline — personalized sources first, web search fallback
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 51ea032838 feat: add scrape_flat_urls helper and gap-aware search prompt
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard d508b5b4ab feat: Autre category support in rewrite schema, final sections, URL restore + remove dead code
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago