35 Commits (master)

Author SHA1 Message Date
oabrivard cd5f5434b2 feat: add rss_url and rss_discovered_at columns to sources
Add nullable rss_url (TEXT) and rss_discovered_at (TIMESTAMPTZ) columns
to the sources table for RSS feed URL caching. Update the Source struct,
all query_as SELECT/RETURNING queries, and add update_source_rss db function.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard b124d73c2a fix: P1 audit items — CSV export theme filter, theme validation, ownership checks, history enums, i18n
- export_csv now accepts optional theme_id query param and filters accordingly
- Add UpdateThemeRequest::validate() with bounds checking; call it in the update handler
- Verify theme ownership in sources::create when theme_id is provided
- Update STATUS_OPTIONS (add filtered_too_old, filtered_not_article; remove filtered_duplicate) and SOURCE_TYPE_OPTIONS (add brave_search; remove overflow) in ArticleHistory
- Replace hardcoded French strings ('Confirmer', 'Erreur inconnue') with t() calls; add settings.apiKeys.unknownError key to fr.ts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 months ago
oabrivard d5d624b896 fix: P0 audit bugs — theme-scoped imports/preferred, creation flow, scheduler timeout, job cleanup
- Bulk/CSV import now passes theme_id through to DB
- Preferred source update scoped by theme_id (no cross-theme reset)
- Theme creation sends sensible defaults from frontend
- Scheduler wraps generation in 15-minute timeout
- Job store cleanup runs every 5 minutes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
oabrivard 384649b2b6 feat: add theme schedules — model, DB, CRUD handler, routes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard e43a4d2180 feat: add preferred sources — prioritized during synthesis generation
Users can mark sources as preferred via star buttons on the theme page.
Preferred sources are processed first in the pipeline (ordered before
non-preferred in waves, shuffled separately then merged).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 196005a27b feat: multi-theme Phase 1 — settings split, sources/syntheses theme_id, pipeline theme-aware
Remove content settings from settings table (moved to themes).
Add theme_id to sources and syntheses. Pipeline loads content
settings from the selected theme. Generate endpoint requires theme_id.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 467ad550a5 feat: add Theme model and DB queries
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard 6b84c335d0 feat: improve synthesis list cards with time, all categories, and uniform height
- Add generation time below date in synthesis cards
- Show all categories with article count in parentheses
- Use flex-col layout for uniform card height
- Add sections_summary to SynthesisListItem API response
- Add formatTime utility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard e24236a069 feat: add max_links_per_source setting (default 8, was hardcoded 15)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard d234fa9b24 feat: add is_article LLM check + remove use_llm_for_source_links option
The LLM now determines if scraped content is a real article during
classify (zero extra cost). The separate LLM link extraction option
is removed — heuristic extraction is sufficient.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 0f1b0306e4 feat: add source_extraction_window setting (default 3)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard c5a56c8fb8 feat: save publication date in article history and show in synthesis
- Add published_date column to article_history table
- Add date field to NewsItem (serialized in synthesis JSONB)
- Pass LLM-extracted date through ArticleTrace to article history
- Display date below article title in SynthesisDetail page

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 1b63afd12a feat: add summary_length setting (1=court, 2=moyen, 3=detaille)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard 2036c12b24 refactor: eliminate SettingsResponse struct, serialize UserSettings directly
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard a968fdc308 fix: allow brave_search as valid API key provider
Split VALID_PROVIDERS (LLM only) from VALID_API_KEY_PROVIDERS (includes
brave_search) so Brave keys can be stored without allowing brave_search
as an admin LLM provider.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard f414ff0f58 feat: add use_brave_search setting
Add use_brave_search boolean field to all settings structs, DB layer,
SQL queries, frontend types, i18n labels, and test fixtures following
the same pattern as use_llm_for_source_links.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard 4c6381b09a feat: add batch_size setting for Phase 1 parallelism
Add a user-configurable batch_size setting (default 5, range 1-20)
that controls how many articles are processed in parallel during
Phase 1 scrape+classify. Previously hardcoded to 5.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 8d232c1ade feat: split model selection — scraping vs websearch with GPT-5 models
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 825b793387 feat: drop source_diversity_window and use_llm_for_article_extraction settings
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard f9023cff7e feat: LLM logs viewer page + log button on Home synthesis list
- Add LlmLogs page with collapsible prompts/response sections, call-type
  colored badges, and duration display
- Wire /llm-logs/:jobId route in App.tsx (lazy-loaded)
- Expose job_id in backend SynthesisListItem and frontend SynthesisListItem
  type; update test fixture accordingly
- Add log-icon link next to delete button on each Home synthesis card

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard b9003cde54 feat: instrument pipeline with article tracing at every filtering step
Add source_url field to ScrapedNewsItem and a trace_article helper that
inserts into article_history with full provenance metadata.  Instrument
Phase 1 (empty content, history dedup, source diversity) and Phase 2
(homepage filter, cross-phase dedup, history dedup, empty content) so
every dropped article is recorded with its filter reason.  Replace the
old insert_urls call with per-article trace_article calls for used
articles, preserving dedup semantics via url_hash.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 0e2c69edf7 feat: save job_id on syntheses for provenance lookup
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard c271c240a2 feat: add article_history table and article_history_days setting
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard e483789d1b feat: add use_llm_for_source_links and use_llm_for_article_extraction settings
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard a31915d3ce feat: add source_diversity_window setting (migration + model + DB + validation tests)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard c1ee79bcf6 feat: add max_articles_per_source setting (migration + model + DB)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 months ago
oabrivard 9b994e0528 v2: pipeline user model selection, rate limiter, URL filter, original title, null-safe sections
- resolve_provider_and_key() now respects user ai_provider preference
- Dual model resolution: ai_model for search pass, ai_model_writing for rewrite pass
- Per-generation rate limiter with user override support
- Homepage URL filter removes domain-only URLs after search pass
- ScrapedNewsItem gains original_title field populated from page <title>
- SynthesisResponse::try_from handles null sections gracefully (returns empty vec)
- Search prompt warns LLM against returning homepage URLs
- Rewrite prompt instructs LLM to use originalTitle with language preservation rules

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard ed6b41fe52 v2: add settings migration, model expansion, DB queries (provider, models, rate limits)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 04819aa926 Simplify code: deduplicate patterns, fix captcha field name bug
Bug fix:
- Fix frontend sending captcha_token instead of turnstile_token in
  login/register requests (would cause 422 errors on auth)

Backend simplifications:
- Deduplicate VALID_PROVIDERS constant (provider.rs is now the single source)
- Extract validate_display_name/validate_models helpers in provider model
- Add From<UserSettings> for SettingsResponse, From<User> for AdminUserResponse
- Consolidate Resend API call pattern into shared send_via_resend()
- Extract do_bulk_import() for sources bulk/CSV import
- Use idiomatic range.contains() for rate limit validation

Frontend simplifications:
- Consolidate file download logic (exportCsv reuses shared fetchFile/triggerDownload)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 1f9f7f39d7 Phase 7: Email sending via Resend + Markdown/PDF export
Backend:
- Synthesis email sending via Resend API with HTML template (inline CSS,
  tables-based for email client compatibility) + plain-text fallback
- XSS prevention via html_escape() on all user content in email templates
- Markdown export: clean format with headers, links, summaries
- PDF export: printpdf with built-in Helvetica fonts, indigo color scheme,
  automatic page breaks, word wrapping
- 3 new endpoints: send-email, export/markdown, export/pdf
- All endpoints enforce ownership checks
- Email validation using email_address crate
- 24 new unit tests, 13 integration tests

Frontend:
- Email section on SynthesisDetail: input pre-filled with user email,
  send button with loading state, success/error feedback
- Export buttons: Markdown + PDF with per-button loading states
- File download via Blob + temporary anchor with Content-Disposition parsing
- 6 new export tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard aa6f1ba76b Phase 5: Generation pipeline with SSE progress, syntheses CRUD
Backend:
- Full 2-pass generation pipeline: LLM search -> URL scraping -> LLM rewrite
- Async generation with tokio::spawn, JobStore with per-user concurrency limit
- SSE progress streaming via axum::response::Sse + tokio::sync::watch
- Syntheses CRUD: list (paginated), get (ownership check), delete
- Prompt construction ported from original geminiService.ts
- Parallel URL scraping with bounded concurrency (max 10)
- Graceful partial failure handling (some URLs fail -> continue)
- 36 new unit tests, 16 integration tests

Frontend:
- Home dashboard: synthesis card grid, week badges, delete with confirmation
- Generate page: SSE-driven progress bar, step checklist, auto-redirect
- Synthesis detail: section-by-section display, external links, delete
- SSE client helper with auto-reconnect (exponential backoff)
- Date utilities with French locale formatting

Critical fixes applied:
- SSE EventSource now sends credentials (withCredentials: true)
- Gemini error logging sanitized to prevent API key leak in logs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 439e547367 Phase 4: LLM provider abstraction with Gemini, user API key encryption
Backend:
- LlmProvider async trait with generate_search_pass/generate_rewrite_pass
- GeminiProvider: googleSearch grounding (pass 1), structured JSON output (pass 2)
- AES-256-GCM encryption for user API keys at rest (per-key random nonces)
- MasterKey with zeroize-on-drop (no Clone to prevent unzeroized copies)
- User API key endpoints: list (prefix only), create/update, delete, test
- Dynamic category schema builder for structured LLM output
- Provider factory (Gemini implemented, OpenAI/Anthropic stubbed for Phase 6)
- 37 new unit tests (encryption, schema, Gemini serialization, factory)
- 17 integration tests (CRUD, encryption verification, ownership isolation)

Frontend:
- ApiKeyManager component: per-provider key management in Settings
- Password input with show/hide toggle, key prefix display (monospace)
- Test button validates key with minimal LLM call
- Status badges (configured/not configured)
- 11 new tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 5abbf9b9ad Phase 3: Admin module with provider/model curation, rate limits, user management
Backend:
- Admin API: CRUD for providers, rate limits, user role management
- Public config endpoint for enabled providers/models
- AdminUser extractor enforces RBAC on all admin endpoints
- Per-provider rate limiter with hot-reload from DB
- Audit logging for all admin mutations
- Seed data: Gemini, OpenAI, Anthropic providers with default models
- Self-demotion prevention on role changes
- 30 integration tests, 27 new unit tests

Frontend:
- Admin layout with sidebar navigation (providers, rate limits, users)
- Provider management: enable/disable, model CRUD, default model selection
- Rate limit configuration with effective rate display
- User management with role badges and promote/demote
- Admin link in navbar/mobile menu (visible only to admins)
- Settings page: dynamic provider/model selection from admin config
- 10 new tests (admin guard, config API)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago
oabrivard 2b75dc7049 Finished phase 2 3 months ago
oabrivard 355dbf6a5a Finished phase 1 3 months ago