OpenAI's default output limit (4096 tokens) was too low for structured
synthesis output with multiple categories and articles per category,
causing truncated JSON. Set 16384 for both OpenAI APIs (Responses +
Chat Completions) and Gemini. Anthropic already had 16384.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add max_articles_per_source setting (default 3, range 1-10) with migration,
backend model, DB queries, and frontend number input
- Add limit_articles_per_source filter: spreads articles across categories
(1 per domain per category first), then fills remaining slots up to the limit
- Add dedup_by_url filter: removes duplicate URLs across categories (case-insensitive)
- Pipeline order: parse → filter_homepage → dedup_by_url → limit_per_source → scrape
- 10 new unit tests covering spread, cap enforcement, dedup, and edge cases
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add max_articles_per_source field to UserSettings interface and DEFAULT_SETTINGS,
expose it as a number input on the Settings page, and add the French i18n label.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The adaptive pipeline skipped the scrape+rewrite pass when the LLM's search
results had URLs starting with "http". But LLMs hallucinate plausible URLs
(Wikipedia, corporate sites) that pass the http check but aren't actual source
articles. The scrape pass catches these by fetching each URL and validating
the content exists. Always running the full pipeline ensures URL integrity.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
normalizeUrl now strips smart quotes, zero-width spaces, and other invisible
formatting characters that browsers inject when copy-pasting URLs from rich
text sources. Prevents false "URL invalid" errors for valid URLs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The LLM was returning only 1 article per category despite the user setting 4.
- Added minItems/maxItems to the category array schema (enforced by OpenAI strict mode)
- Changed prompt from "au maximum N actualites" to "exactement N actualites"
- Schema builder now takes max_items_per_category parameter
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LLM output occasionally contains \u0000 null bytes (e.g., "annonc\u0000...")
which PostgreSQL rejects in JSONB columns. Added sanitize_json_null_bytes()
that recursively strips null bytes from all string values before DB insert.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The scraper client (build_scraper_client) has a 15s timeout appropriate for web
scraping, but LLM API calls — especially with web search — take 30-60s. LLM
providers now build their own reqwest client with 120s timeout via build_llm_client().
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use import.meta.url for ESM-compatible __dirname
- Source creation expects 201, not 200
- Clean up existing sources before adding to avoid unique constraint violation
- Fix E2E docker-compose build context to project root
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bugs fixed:
- resolve_model queried non-existent admin_provider_models table (use JSONB query on admin_providers)
- key_prefix VARCHAR(10) too short for 11-char prefix (migration to VARCHAR(12))
- API key test schema missing additionalProperties: false (OpenAI strict mode)
- CSP missing font-src data: directive (PDF font embedding blocked)
- Magic link URL not logged in test mode (can't verify without real email)
- Rust 1.85 Docker image too old for dependencies (bumped to 1.88)
Tests added to prevent recurrence:
- schema_meets_openai_strict_mode_requirements: validates additionalProperties on all objects
- key_prefix_full_length_stored_in_db: verifies 11-char prefix survives DB round-trip
- generate_pipeline_resolves_model_from_admin_config: exercises full generation pipeline
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add SIGTERM/Ctrl+C signal handling with graceful connection draining
- Close database pool cleanly on shutdown
- Add frontend-builder stage to Dockerfile (node:22-alpine, npm ci + build)
- Move Docker build context to project root so both frontend/ and backend/ are accessible
- Frontend dist/ copied into container at ./static/ for the backend to serve
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract auth::create_and_send_magic_link() to deduplicate token rollback logic
- Replace (i32, i32, RateLimiter) tuple with named UserRateLimitEntry struct
- Use DashMap entry API for atomic rate limiter lookup (fixes TOCTOU race)
- Pre-allocate scraper body Vec from Content-Length when available
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- updateRole return type matches backend's {id, role} instead of full AdminUser
- fetchFile error priority aligned with central client (message before error)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Wire hardened scraper client into runtime (SSRF redirect validation was defined but unused)
- Stream scraper body with per-chunk size limit instead of post-download check (DoS/OOM)
- Persist user rate-limit overrides across generation jobs via AppState DashMap
- Roll back magic-link token on email send failure to prevent quota exhaustion
- Fix API error UX: prefer human message over machine error code in frontend
- Unwrap GET /syntheses { items } wrapper in frontend API layer (contract mismatch)
- Bind Postgres to localhost in docker-compose (was exposed on all interfaces)
- Fix CLAUDE.md: runtime queries not compile-time, 10 migrations not 9
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 fixes: syntheses list contract alignment (frontend to backend),
admin rate-limits path fix (provider_name not id), SSRF redirect
per-hop validation with attempt.error(), shared test fixtures.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Playwright E2E testing setup with Docker Compose test environment,
database seed script, auth helpers, and five test scenarios covering
registration, admin providers, settings persistence, sources CRUD,
and settings export/import.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add test-utils.tsx with renderWithProviders (MemoryRouter + I18n + Toast),
mockFetch, and mockFetchRoutes helpers. Create 39 interaction-level tests
across 6 page components covering rendering, form validation, API calls,
delete confirmation flows, SSE progress, and file import/export.
Also add Blob.text() polyfill in test setup for jsdom compatibility.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract cookie parsing into a standalone `extract_session_token` function
and add 5 unit tests covering the valid, missing, multi-cookie, whitespace,
and empty-header cases.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Bug fix:
- Per-generation rate limiter was creating a new instance on every check,
making user rate limit overrides non-functional. Fixed by creating the
limiter once at pipeline start and reusing for both passes.
Simplifications:
- Extract spawn_task closure in scrape_articles (deduplicate spawn blocks)
- Use idiomatic if let Ok(...) instead of if let Some(..).ok() in scraper
- Replace manual loop with iterator chain in export_keys handler
- Simplify check_rate_limit to single boolean check
- Simplify handleImport settings merge (spread already provides defaults)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>