ai_synth

Commit Graph

Author	SHA1	Message	Date
oabrivard	e6e8aa1eeb	feat: add LLM prompts and schemas for link and article extraction Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	23f121a58d	feat: ScrapedContent url+head_html fields, Arc<dyn LlmProvider>, 3-tuple scrape returns Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	104b6a0d7b	feat: add classification prompt and schema for article categorization Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	3 months ago
oabrivard	8a18b70aff	fix: set max output tokens to 16384 for all LLM providers OpenAI's default output limit (4096 tokens) was too low for structured synthesis output with multiple categories and articles per category, causing truncated JSON. Set 16384 for both OpenAI APIs (Responses + Chat Completions) and Gemini. Anthropic already had 16384. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	45c9e71589	fix: enforce max_items_per_category in JSON schema and prompt The LLM was returning only 1 article per category despite the user setting 4. - Added minItems/maxItems to the category array schema (enforced by OpenAI strict mode) - Changed prompt from "au maximum N actualites" to "exactement N actualites" - Schema builder now takes max_items_per_category parameter Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	3fe667591d	fix: LLM providers use own HTTP client with 120s timeout (was sharing scraper's 15s) The scraper client (build_scraper_client) has a 15s timeout appropriate for web scraping, but LLM API calls — especially with web search — take 30-60s. LLM providers now build their own reqwest client with 120s timeout via build_llm_client(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	004f08f385	fix: runtime bugs found during first Docker run + integration tests Bugs fixed: - resolve_model queried non-existent admin_provider_models table (use JSONB query on admin_providers) - key_prefix VARCHAR(10) too short for 11-char prefix (migration to VARCHAR(12)) - API key test schema missing additionalProperties: false (OpenAI strict mode) - CSP missing font-src data: directive (PDF font embedding blocked) - Magic link URL not logged in test mode (can't verify without real email) - Rust 1.85 Docker image too old for dependencies (bumped to 1.88) Tests added to prevent recurrence: - schema_meets_openai_strict_mode_requirements: validates additionalProperties on all objects - key_prefix_full_length_stored_in_db: verifies 11-char prefix survives DB round-trip - generate_pipeline_resolves_model_from_admin_config: exercises full generation pipeline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	631bd43b9f	Phase 6: Multi-provider support with OpenAI and Anthropic Backend: - OpenAiProvider: Responses API with web_search_preview (pass 1), Chat Completions with json_schema structured output (pass 2) - AnthropicProvider: Messages API with web_search tool (pass 1), schema-in-prompt for structured output, code fence stripping (pass 2) - Pipeline adaptation: skip scrape+rewrite when >70% of search URLs are valid - Provider factory updated for all three providers - Error sanitization extended for Anthropic key patterns (sk-ant-) - 44 new unit tests (OpenAI, Anthropic, factory, pipeline heuristic) Frontend: - Provider-specific info text below model selection - Web search support badges (green/gray) - Generate page shows selected provider and model - Warning banner when provider lacks web search - Provider utility module with 10 tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	aa6f1ba76b	Phase 5: Generation pipeline with SSE progress, syntheses CRUD Backend: - Full 2-pass generation pipeline: LLM search -> URL scraping -> LLM rewrite - Async generation with tokio::spawn, JobStore with per-user concurrency limit - SSE progress streaming via axum::response::Sse + tokio::sync::watch - Syntheses CRUD: list (paginated), get (ownership check), delete - Prompt construction ported from original geminiService.ts - Parallel URL scraping with bounded concurrency (max 10) - Graceful partial failure handling (some URLs fail -> continue) - 36 new unit tests, 16 integration tests Frontend: - Home dashboard: synthesis card grid, week badges, delete with confirmation - Generate page: SSE-driven progress bar, step checklist, auto-redirect - Synthesis detail: section-by-section display, external links, delete - SSE client helper with auto-reconnect (exponential backoff) - Date utilities with French locale formatting Critical fixes applied: - SSE EventSource now sends credentials (withCredentials: true) - Gemini error logging sanitized to prevent API key leak in logs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago
oabrivard	439e547367	Phase 4: LLM provider abstraction with Gemini, user API key encryption Backend: - LlmProvider async trait with generate_search_pass/generate_rewrite_pass - GeminiProvider: googleSearch grounding (pass 1), structured JSON output (pass 2) - AES-256-GCM encryption for user API keys at rest (per-key random nonces) - MasterKey with zeroize-on-drop (no Clone to prevent unzeroized copies) - User API key endpoints: list (prefix only), create/update, delete, test - Dynamic category schema builder for structured LLM output - Provider factory (Gemini implemented, OpenAI/Anthropic stubbed for Phase 6) - 37 new unit tests (encryption, schema, Gemini serialization, factory) - 17 integration tests (CRUD, encryption verification, ownership isolation) Frontend: - ApiKeyManager component: per-provider key management in Settings - Password input with show/hide toggle, key prefix display (monospace) - Test button validates key with minimal LLM call - Status badges (configured/not configured) - 11 new tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	3 months ago

10 Commits (7cbb2853ced0b47e5957a4e7ec1607c03738fce0)