21 Commits (e2ce401ea69bf7152c6a5cf27a59c8f7e6ccb99e)

Author SHA1 Message Date
oabrivard 1b20d38bbd fix: add TechCrunch source and increase retries for flaky generation test
The live generation test depends on real OpenAI + web scraping.
Adding a second source improves chances of finding articles.
Retries increased from 1 to 2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 196005a27b feat: multi-theme Phase 1 — settings split, sources/syntheses theme_id, pipeline theme-aware
Remove content settings from settings table (moved to themes).
Add theme_id to sources and syntheses. Pipeline loads content
settings from the selected theme. Generate endpoint requires theme_id.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard e24236a069 feat: add max_links_per_source setting (default 8, was hardcoded 15)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard d234fa9b24 feat: add is_article LLM check + remove use_llm_for_source_links option
The LLM now determines if scraped content is a real article during
classify (zero extra cost). The separate LLM link extraction option
is removed — heuristic extraction is sufficient.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 0f1b0306e4 feat: add source_extraction_window setting (default 3)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard bf07b049f3 feat: add summary length slider to Settings page
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard ec798831c3 fix: use broader theme and 30-day window in generation E2E test
'AI Weekly' with 7 days was too narrow — most articles filtered as
too old. 'Intelligence Artificielle' with 30 days gives more results.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 1dad319e5e fix: update E2E tests for Turnstile DOM stability, stale selectors, and pipeline changes
- Replace waitForLoadState('networkidle') with 'domcontentloaded' to
  avoid hangs caused by the Cloudflare Turnstile script
- Add { waitUntil: 'domcontentloaded' } to all page.goto() calls
- Rewrite registration test to use the API directly instead of UI form
  submission, since the Turnstile script causes continuous DOM mutations
  that prevent Playwright from clicking elements
- Fix admin-providers test to select Gemini from the provider dropdown
  when multiple providers are enabled
- Fix sources test to clean up leftover sources before asserting counts
- Update generation-live LLM call type assertion from 'rewrite' to
  'search' to match the current pipeline (classify_summarize is optional)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard d8ccd779d7 fix: rename duplicate jobId variable in E2E test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard f414ff0f58 feat: add use_brave_search setting
Add use_brave_search boolean field to all settings structs, DB layer,
SQL queries, frontend types, i18n labels, and test fixtures following
the same pattern as use_llm_for_source_links.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard 4c6381b09a feat: add batch_size setting for Phase 1 parallelism
Add a user-configurable batch_size setting (default 5, range 1-20)
that controls how many articles are processed in parallel during
Phase 1 scrape+classify. Previously hardcoded to 5.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard fb765d6c8f feat: split model dropdowns — scraping vs websearch in frontend
Replace the single `models` array in `ProviderConfig` and `AdminProvider`
with separate `models_scraping` / `models_websearch` lists. Rename
`ai_model_writing` → `ai_model_websearch` in `UserSettings` and all
references (Settings page, admin Providers page, E2E test, fixtures,
and unit tests). Update i18n label for the second dropdown to
"Modele d'IA (Recherche Web)".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard 48957470ed test: update E2E test for new pipeline (remove deprecated settings) 3 months ago
oabrivard d9982b467c test: verify LLM call logs endpoint returns data after generation 3 months ago
oabrivard f7428191ec test: verify provenance endpoint returns tracing data after generation 3 months ago
oabrivard cea723f7d7 test: update E2E and integration tests with article_history_days setting 3 months ago
oabrivard a7599e512a test: comprehensive E2E synthesis validation (duplicates, links, counts, domains)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard 13894a8f50 fix: filter empty scraped articles + restore URLs after rewrite + E2E assertions
- filter_empty_scraped_articles: removes articles with empty scraped content
  (too old, soft 404, scrape failure) before the rewrite pass, preventing
  empty articles in the final synthesis
- restore_scraped_urls: already existed, now has unit tests
- E2E test: added assertions for no Wikipedia URLs, no empty summaries,
  and updated settings payload with new fields (max_articles_per_source,
  source_diversity_window)
- 4 new unit tests for filter_empty + restore_scraped_urls

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 69e5f2257a fix: UAT test — ESM compat, correct status codes, idempotent source setup
- Use import.meta.url for ESM-compatible __dirname
- Source creation expects 201, not 200
- Clean up existing sources before adding to avoid unique constraint violation
- Fix E2E docker-compose build context to project root

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 97cb58ff42 fix: improve type safety and error handling in generation UAT 3 months ago
oabrivard 02017db2e0 test: add live generation UAT with real OpenAI API key
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago