diff --git a/docs/audit/00-consolidated-summary.md b/docs/audit/00-consolidated-summary.md new file mode 100644 index 0000000..2927763 --- /dev/null +++ b/docs/audit/00-consolidated-summary.md @@ -0,0 +1,127 @@ +# Code Audit — Consolidated Summary + +**Date**: 2026-03-26 +**Auditors**: Software Architect, Frontend Expert, Rust Expert, Tech Lead, Devil's Advocate + +## Overall Assessment + +The codebase is well-structured with strong fundamentals: clean layer separation, zero `unsafe` code, zero `unwrap()` in production paths, comprehensive security measures (AES-256-GCM encryption, SSRF prevention, anti-enumeration timing), and good test coverage (363+ unit tests, 9 integration test files). The code is a learning project that has evolved significantly through rapid iteration. + +The main issues cluster around **complexity in the synthesis pipeline** and **accumulated duplication** from feature layering. + +--- + +## P0 — Fix Immediately + +These are bugs that can crash or corrupt the application. + +| # | Issue | Source | Files | +|---|-------|--------|-------| +| 1 | **UTF-8 panic**: `&msg[..200]` panics on multi-byte chars | Rust Expert, Devil's Advocate | `synthesis.rs:1313` | +| 2 | **Race condition in job creation**: double-click creates duplicate jobs (DashMap check + insert not atomic) | Devil's Advocate | `synthesis.rs` (job management) | +| 3 | **XSS via `innerHTML`**: renders user-controlled `theme` unsafely | Frontend Expert | `GenerateSynthesis.tsx` | + +--- + +## P1 — Fix Within 30 Days + +| # | Issue | Source | Files | +|---|-------|--------|-------| +| 4 | **Expired sessions never cleaned up**: `sessions::delete_expired()` exists but is never called | Devil's Advocate | startup / scheduled task | +| 5 | **No pipeline timeout**: hung LLM call blocks generation slot indefinitely | Devil's Advocate | `synthesis.rs` | +| 6 | **SSRF bypass via IPv4-mapped IPv6**: `::ffff:127.0.0.1` passes all current checks | Rust Expert, Devil's Advocate | `scraper.rs` | +| 7 | **Missing SSRF check on initial source page fetch** | Rust Expert | `source_scraper.rs` | +| 8 | **Spawned generation tasks swallow panics**: SSE clients hang forever | Rust Expert | `generation handler` | +| 9 | **Missing `onCleanup` for `setTimeout`**: timers fire after component unmount | Frontend Expert | `Home.tsx`, `ArticleHistory.tsx`, `SynthesisDetail.tsx` | + +--- + +## Structural Refactoring — High Priority + +These are the main maintainability issues, confirmed by 3+ auditors. + +### 1. Decompose `synthesis.rs` (1,686 lines) + +**Confirmed by**: Architect, Tech Lead, Rust Expert, Devil's Advocate + +The `run_generation_inner` function is ~650 lines with the scrape+classify batch loop duplicated 3 times (Phase 1, Phase 2 Brave, Phase 2 LLM). Article filtering (homepage, dedup, history, diversity) is also duplicated across paths. + +**Refactoring plan**: +- Extract `process_article_batch()` — shared scrape+classify loop (~150 lines saved) +- Extract `filter_candidate_url()` — shared filtering logic +- Extract `assign_category()` — shared category assignment + overflow logic +- Consider splitting into `synthesis/pipeline.rs`, `synthesis/phases.rs`, `synthesis/helpers.rs` + +**Effort**: Large | **Impact**: Critical + +### 2. Eliminate `SettingsResponse` struct + +**Confirmed by**: Tech Lead + +`SettingsResponse` is identical to `UserSettings` minus `user_id` and `updated_at`. Every new setting requires changes to 4 Rust structs + SQL queries + frontend types. Serialize `UserSettings` directly with `#[serde(skip)]` on internal fields. + +**Effort**: Small | **Impact**: High (reduces per-setting change cost) + +### 3. Decompose `Settings.tsx` (1,025 lines) + +**Confirmed by**: Frontend Expert, Tech Lead + +Split into sub-components: `SettingsGeneral`, `SettingsProvider`, `SettingsAdvanced`, `SettingsBraveSearch`, `SettingsRateLimit`. + +**Effort**: Medium | **Impact**: High + +### 4. Extract shared LLM error mapping + +**Confirmed by**: Tech Lead + +`map_openai_error`, `map_gemini_error`, `map_anthropic_error` are near-identical. Extract to a shared `map_http_error` function. + +**Effort**: Small | **Impact**: Medium + +### 5. Replace `trace_article` 11-parameter function with struct + +**Confirmed by**: Tech Lead, Rust Expert + +```rust +struct ArticleTrace { url, title, source_type, source_url, category, synthesis_id, status, scraped_ok } +``` + +**Effort**: Small | **Impact**: Medium + +--- + +## Medium Priority Improvements + +| # | Issue | Source | +|---|-------|--------| +| 10 | N+1 individual INSERTs for article history — batch with `unnest()` | Rust Expert | +| 11 | 59 `.clone()` calls in pipeline — use `Arc` for shared immutables | Rust Expert | +| 12 | CSS selectors re-parsed every call — use `LazyLock` | Rust Expert | +| 13 | Inconsistent data fetching in frontend (`createResource` vs `onMount` + signals) | Frontend Expert | +| 14 | Reusable `Button` component exists but unused — inline Tailwind duplicated | Frontend Expert | +| 15 | Default settings diverge between frontend and backend | Tech Lead | +| 16 | `AppConfig` carries secrets in cloneable `String` — use `Arc`-wrapped | Rust Expert, Devil's Advocate | +| 17 | In-memory rate limits reset on restart | Devil's Advocate | +| 18 | `SESSION_SECRET` env var validated but never used | Devil's Advocate | + +--- + +## Positive Highlights + +All auditors noted these strengths: +- **Zero `unsafe` code**, zero `unwrap()` in production +- **Strong security posture**: AES-256-GCM with zeroized keys, SSRF prevention (14 tests), anti-enumeration timing, parameterized SQL +- **Clean error handling**: `AppError` enum with proper HTTP mapping, internal errors never leaked +- **Well-designed `LlmProvider` trait**: clean abstraction over 3 providers +- **Comprehensive test suite**: good coverage on scraper, LLM providers, rate limiter, encryption +- **Excellent module documentation**: every module has `//!` doc comments + +--- + +## Detailed Reports + +- [Architect Report](architect-report.md) — SOLID principles, design patterns, architecture +- [Frontend Report](frontend-report.md) — SolidJS patterns, component architecture +- [Rust Expert Report](rust-expert-report.md) — Idiomatic Rust, async, safety +- [Tech Lead Report](tech-lead-report.md) — Complexity, duplication, refactoring plan +- [Devil's Advocate Report](devils-advocate-report.md) — Hidden risks, security, failure modes diff --git a/docs/audit/architect-report.md b/docs/audit/architect-report.md new file mode 100644 index 0000000..893a247 --- /dev/null +++ b/docs/audit/architect-report.md @@ -0,0 +1,237 @@ +# Architecture & SOLID Audit Report + +## Executive Summary + +AI Weekly Synth demonstrates strong foundational architecture with clean layer separation (handlers, services, DB), a well-designed trait abstraction for LLM providers, and robust error handling. However, the synthesis pipeline (`services/synthesis.rs`) has accumulated significant complexity that violates Single Responsibility and makes the code difficult to test, extend, and maintain. The codebase would benefit most from decomposing the pipeline into smaller, composable stages and introducing trait-based dependency injection for external services. + +## SOLID Analysis + +### Single Responsibility + +**Mostly followed in the outer layers; severely violated in the synthesis pipeline.** + +**Good examples:** +- `errors.rs` has a single job: map application errors to HTTP responses. Clean, focused, well-tested. +- `middleware/auth.rs` does exactly one thing: extract authenticated users from session cookies. The `AdminUser` wrapper cleanly composes on top. +- `services/encryption.rs` is a focused module for AES-256-GCM operations with no extraneous concerns. +- `services/auth.rs` cleanly coordinates token creation, email sending, and session management without overstepping. +- Each file in `db/` handles queries for exactly one table (users, sessions, syntheses, etc.). + +**Critical violation: `services/synthesis.rs` (~1250 lines)** + +This single file is responsible for: +1. Job lifecycle management (create, subscribe, cleanup via `JobStore`) +2. Progress event definition and emission +3. Provider and API key resolution +4. Rate limiting coordination +5. Source rotation strategy +6. Parallel scraping orchestration (Phase 1) +7. LLM classification and summarization orchestration (Phase 1) +8. Brave Search integration orchestration (Phase 2) +9. LLM web search fallback orchestration (Phase 2) +10. URL normalization, hashing, and deduplication +11. Article history filtering +12. Source diversity enforcement +13. Section assembly and persistence +14. Article provenance tracing +15. Error sanitization + +The `run_generation_inner` function alone spans ~640 lines. The nearly identical scrape-then-classify batch processing loop is duplicated between Phase 1 (lines 406-551) and the Brave Search path (lines 627-758), with only minor differences in tuple shapes. This duplication is a strong signal that a shared abstraction is missing. + +**Other SRP observations:** +- `handlers/auth.rs` mixes HTTP concern (cookie construction in `build_session_cookie`) with business logic (token verification in `verify_token`). The cookie construction could live in a utility or the auth service. +- `AppState` (in `app_state.rs`) holds both infrastructure (pool, config) and domain state (job_store, rate_limiters). This is acceptable at current scale but should be watched as the application grows. + +### Open/Closed + +**Good for LLM providers; poor for the synthesis pipeline.** + +**Good examples:** +- The `LlmProvider` trait plus `factory::create_provider` follow OCP well. Adding a new provider (e.g., Mistral) requires implementing the trait and adding one match arm in the factory -- no existing code needs modification. +- `AppError` uses enum variants with `thiserror` derives. Adding a new error category is additive. +- The `FromRequestParts` extractors (`AuthUser`, `AdminUser`) compose cleanly -- adding new authorization levels means adding new extractors. + +**Violations:** +- The synthesis pipeline has hardcoded knowledge of all search strategies (LLM search, Brave Search, personalized sources). Adding a new search backend (e.g., Serper, SearXNG) requires modifying the 640-line orchestration function. There is no `SearchBackend` trait or strategy pattern. +- `factory::create_provider` uses a match statement that must be modified for each new provider. A registration-based factory (e.g., `HashMap<&str, fn(String, Client) -> Arc>`) would eliminate this. +- The `ProviderRateLimiter` embeds SQL queries directly (line 173 of `rate_limiter.rs`) rather than going through the `db` module, bypassing the architectural boundary. The comment says "to avoid circular dependency" -- this signals a structural issue. + +### Liskov Substitution + +**Correctly applied where traits are used.** + +- All three `LlmProvider` implementations (`GeminiProvider`, `OpenAiProvider`, `AnthropicProvider`) correctly implement the trait contract. Callers use `Arc` throughout and never downcast. +- The `AnthropicProvider` implementation is noteworthy: it adapts to Anthropic's lack of native JSON schema enforcement by embedding the schema in the system prompt. This is a legitimate behavioral difference that is hidden behind the trait -- exactly what LSP intends. The contract is "return structured JSON matching the schema," not "use native schema enforcement." +- `FromRequestParts` implementations for `AuthUser` and `AdminUser` correctly maintain the contract expectations. + +**No LSP violations detected.** The trait abstractions are lean and appropriately designed. + +### Interface Segregation + +**Traits are lean and focused -- no violations.** + +- `LlmProvider` has only two methods: `provider_id()` and `call_llm()`. This is appropriately minimal. If web search support needed to be optional (e.g., only some providers support grounded search), a separate `WebSearchProvider` trait would be warranted, but currently the pipeline handles this externally. +- The codebase uses free functions in modules (e.g., `db::users::find_by_email`) rather than trait objects for the database layer. This is idiomatic Rust and avoids over-engineering trait hierarchies. + +**Observation:** The absence of traits for services like scraping, email, and encryption is acceptable given the project's scale, but limits testability (see Dependency Inversion section). + +### Dependency Inversion + +**Partially applied -- the weakest SOLID principle in this codebase.** + +**Good:** +- Handlers depend on `AppState` (an abstraction container), not directly on database implementations. +- The synthesis pipeline depends on `Arc`, not on concrete provider types. + +**Significant gaps:** +- **No abstraction over the database layer.** All services and the synthesis pipeline call `db::*` functions directly (e.g., `db::settings::get_or_create_default`, `db::article_history::check_urls_exist`). This means the synthesis pipeline cannot be unit-tested without a live Postgres instance. +- **No abstraction over the scraper.** The synthesis pipeline calls `scraper::scrape_url` and `source_scraper::extract_article_links` as concrete functions. This prevents testing pipeline logic with deterministic, fake scraped content. +- **No abstraction over the email service.** `services/email.rs` is called directly. Testing email-sending behavior requires either a live Resend API key or the hardcoded test bypass. +- **`AppState` is a concrete struct, not a trait.** All handlers are bound to `State`. While this is standard Axum practice, it means handler logic cannot be tested without constructing a full `AppState` with a real database pool. + +The integration tests in `tests/common/mod.rs` confirm this: `TestApp` must create a real Postgres database for every test, run migrations, and tear down afterward. The test helper works well, but the inability to unit-test service logic in isolation is a direct consequence of missing abstractions. + +## Design Patterns + +### Well-Used Patterns + +1. **Strategy Pattern (LLM Providers):** The `LlmProvider` trait + `factory::create_provider` is a textbook Strategy implementation. Providers encapsulate provider-specific API differences (request format, response parsing, error mapping) behind a uniform interface. The factory cleanly maps configuration to concrete strategy. + +2. **Observer Pattern (Progress Streaming):** The `watch` channel pattern for SSE progress streaming is elegant. The `JobStore` manages pub/sub lifecycle, and the `WatchStream` adapter in the handler converts it to an SSE stream. This design correctly handles reconnection (subscribers immediately get the latest state). + +3. **Extractor Pattern (Authentication):** Axum's `FromRequestParts` is used idiomatically for `AuthUser` and `AdminUser`. This is effectively a Decorator pattern -- handlers declare their auth requirements via type parameters, and the framework handles the cross-cutting concern transparently. + +4. **Builder Pattern (Configuration):** `AppConfig::from_env()` + `validate()` is a clean two-phase construction pattern. Validation is separated from loading, allowing the application to fail fast with clear error messages. + +5. **Value Object Pattern:** `MasterKey` is a well-designed value object. It enforces invariants on construction (`from_hex`), prevents accidental cloning (no `Clone` derive), and zeroizes memory on drop. This is security-conscious Rust. + +6. **Repository Pattern (DB layer):** Each `db::*` module acts as a repository, encapsulating SQL queries behind typed function signatures. The separation between the `UserRow` (DB representation) and `User` (domain model) in `db/users.rs` is a good Data Mapper implementation. + +### Missing Patterns + +1. **Pipeline/Chain of Responsibility:** The synthesis pipeline is a sequential series of stages (load settings -> resolve provider -> scrape sources -> classify articles -> web search fallback -> assemble -> save). This is a textbook pipeline pattern that would benefit from explicit stage modeling: + ``` + trait PipelineStage { + async fn execute(&self, ctx: &mut PipelineContext) -> Result<(), AppError>; + } + ``` + Each stage would be independently testable and composable. + +2. **Unit of Work / Transaction:** The synthesis pipeline performs many database writes (article history inserts, LLM call logs, final synthesis save) without a unifying transaction. If the synthesis save fails after article history has been written, the data is inconsistent. A transactional boundary around the save phase would prevent orphaned records. + +3. **Circuit Breaker:** The rate limiter waits up to 60 seconds for rate limit windows to pass, but there is no circuit breaker for provider failures. If a provider returns 5xx errors repeatedly, the pipeline will keep retrying individual articles until the batch is exhausted. A circuit breaker that short-circuits after N consecutive failures would save time and API quota. + +4. **Result Collector / Accumulator:** The batch processing loops manually manage `scraped_articles`, `filled_counts`, `source_counts`, and `seen_urls` as mutable state threaded through the loop. An accumulator pattern would encapsulate this state and its invariants (e.g., "never exceed max_items_per_category") into a dedicated struct with methods like `try_add_article()`. + +### Anti-Patterns + +1. **God Function:** `run_generation_inner` (~640 lines) is the most impactful anti-pattern. It manages too many concerns, has deeply nested control flow, and is untestable in isolation. The function's length makes it difficult to reason about edge cases and encourages bugs during modification. + +2. **Duplicated Logic (DRY Violation):** The scrape-then-classify batch loop appears in near-identical form in three places: + - Phase 1 personalized sources (lines ~406-551) + - Phase 2 Brave Search path (lines ~627-758) + - The response parsing differs slightly, but the core pattern (spawn scrape tasks -> collect results -> spawn classify tasks -> collect and categorize) is the same. + + A shared `process_article_batch()` function accepting a list of `(url, source_identifier)` would eliminate ~150 lines of duplication. + +3. **Primitive Obsession:** Categories are tracked throughout the pipeline using raw strings and index-based keys (`"category_0"`, `"category_autre"`). A `CategoryIndex` newtype or enum would prevent bugs like off-by-one errors and make the intent clearer. + +4. **Implicit Protocol (Stringly Typed):** The `status` field in article history uses raw strings (`"filtered_history"`, `"filtered_diversity"`, `"used"`, `"filtered_homepage"`, etc.). An enum would provide compile-time guarantees and prevent typos. + +5. **Configuration Object Overload:** `UserSettings` has 16 fields that are all passed around as a single struct. The synthesis pipeline destructures individual fields from it in many places (e.g., `settings.max_items_per_category`, `settings.batch_size`, `settings.use_brave_search`). This makes it hard to understand which settings each stage actually depends on. + +## Architecture Assessment + +### Layer Separation + +**Clean three-layer architecture with appropriate boundaries.** + +``` +handlers/ (HTTP) -> services/ (Business Logic) -> db/ (Data Access) +``` + +- **Handlers** correctly limit themselves to request parsing, validation, calling services/DB, and response formatting. No handler contains business logic. +- **Services** implement the business rules. The auth service coordinates between DB, email, and token utilities. The encryption service is self-contained. +- **DB modules** are pure data access with no business logic. SQL queries are parameterized and safe. + +**Minor boundary violations:** +- `ProviderRateLimiter::reload_from_db` in `services/rate_limiter.rs` contains raw SQL queries instead of calling through `db::rate_limits`. The comment acknowledges this is to "avoid circular dependency," suggesting the module boundaries need adjustment. +- `resolve_model` in `services/synthesis.rs` also contains raw SQL (`sqlx::query_scalar` with inline SQL) instead of going through `db::providers`. This bypasses the DB layer abstraction. + +### Coupling + +**Low coupling between modules; high coupling within the synthesis pipeline.** + +- Handler modules are independent of each other. Each handler file imports only from `app_state`, `errors`, `middleware`, relevant `models`, and relevant `services`/`db` modules. +- The LLM provider implementations are fully decoupled -- they share no state and have no knowledge of each other. +- The `db` modules have zero inter-module dependencies (no `db::users` calling `db::sessions`, etc.). + +**High coupling in the synthesis pipeline:** +- `services/synthesis.rs` imports from 9 different modules: `app_state`, `db` (5 submodules), `errors`, `models` (2 submodules), `services` (5 submodules). It is the most connected module in the codebase and acts as a "gravity well" that pulls in all dependencies. +- The `AppState` struct couples all concerns: config, pool, HTTP client, auth rate limiter, provider rate limiter, user rate limiters, and job store. Any change to any of these affects the shared state type. + +### Error Handling + +**Excellent error handling strategy -- one of the strongest aspects of the codebase.** + +- **Single error type:** `AppError` is the only error type in the application. All handlers return `Result`. This is clean and consistent. +- **Automatic conversions:** `From` and `From` provide ergonomic `?` operator usage while ensuring errors are properly categorized. +- **Security-conscious:** Internal errors log the full details server-side but return only "An internal error occurred" to the client. The `sanitize_error_message` function in the synthesis pipeline ensures API keys are never exposed in SSE error events. +- **No `unwrap()` in production paths:** Consistent with the project's stated policy. The few `unwrap()` calls are in test code or behind `expect()` with clear messages (e.g., `"valid minutes value"`). +- **Error variants are semantically meaningful:** `NotFound`, `Unauthorized`, `Forbidden`, `BadRequest`, `Validation`, `Internal`, `RateLimited` -- each maps to a specific HTTP status code with appropriate behavior. + +**Minor observation:** The `Validation` variant accepts a `String`, but a structured error type (e.g., with field name + message) would be more useful for frontends that want to highlight specific form fields. + +### State Management + +**In-memory state is well-designed with appropriate concurrency primitives.** + +- `DashMap` is used for lock-free concurrent access in `RateLimiter`, `ProviderRateLimiter`, `JobStore`, and `user_rate_limiters`. This avoids `Mutex` contention under load. +- `Arc` for progress events is the right choice -- it provides "latest value" semantics so reconnecting SSE clients immediately get caught up. +- The `JobStore` has TTL-based cleanup (1 hour) and post-completion cleanup (5 minutes), preventing memory leaks from abandoned jobs. + +**Potential concern:** The `RateLimiter` in `rate_limiter.rs` stores timestamps in a `Vec` (for the per-key limiter) rather than a `VecDeque`. Old timestamps are evicted via `retain()` which is O(n), whereas `VecDeque::pop_front()` is O(1). The `ProviderRateLimiter` correctly uses `VecDeque`. This inconsistency is minor but worth unifying. + +**State growth risk:** The `user_rate_limiters: DashMap` in `AppState` grows unboundedly as users trigger generation jobs. Entries are only removed when a user's settings change to have no rate limits. A periodic cleanup (e.g., remove entries for users who haven't generated in 24 hours) would prevent slow memory growth. + +## Prioritized Recommendations + +1. **Decompose `services/synthesis.rs` into a multi-stage pipeline.** Extract the monolithic `run_generation_inner` into discrete, testable stages: `SettingsStage`, `SourceScrapingStage`, `ClassificationStage`, `WebSearchStage`, `AssemblyStage`. Each stage should operate on a shared `PipelineContext` struct. This addresses the God Function anti-pattern, the DRY violation in batch processing, and enables unit testing of individual stages. + - Files: `backend/src/services/synthesis.rs` (1250+ lines) + - Impact: Testability, maintainability, extensibility + +2. **Extract a shared `process_article_batch()` function** as an immediate, lower-risk improvement. The near-identical scrape-then-classify loop in Phase 1 and the Brave Search path can be unified into a single function that accepts `Vec<(String, Option)>` (URL + optional source URL) and returns classified `NewsItem`s. This can be done without the full pipeline refactor. + - Files: `backend/src/services/synthesis.rs` (lines ~406-551 and ~627-758) + - Impact: ~150 lines of duplication removed, fewer bugs from divergent copies + +3. **Introduce trait-based abstractions for external services** to enable unit testing. Define traits for `ArticleScraper`, `SearchService`, and `ArticleStore` (history). The synthesis pipeline would depend on these traits, allowing tests to inject fakes. Start with the DB layer trait since it blocks the most test coverage. + - Files: `backend/src/services/synthesis.rs`, `backend/src/services/scraper.rs`, `backend/src/db/article_history.rs` + - Impact: Enables unit testing of synthesis pipeline without Postgres + +4. **Replace stringly-typed status values with enums.** The article history `status` field uses raw strings (`"filtered_history"`, `"filtered_diversity"`, `"used"`, etc.). Define an `ArticleStatus` enum with `Display`/`FromStr` implementations. Similarly, replace the `"category_0"` / `"category_autre"` string keys with a `CategoryKey` newtype. + - Files: `backend/src/services/synthesis.rs`, `backend/src/db/article_history.rs` + - Impact: Type safety, compile-time correctness guarantees + +5. **Move raw SQL out of service modules.** `ProviderRateLimiter::reload_from_db` and `resolve_model` in `synthesis.rs` contain inline SQL queries that bypass the `db/` layer. Move these queries to `db/rate_limits.rs` and `db/providers.rs` respectively. If circular dependencies are the issue, restructure the module hierarchy (e.g., make the rate limiter accept configuration data rather than a pool). + - Files: `backend/src/services/rate_limiter.rs` (lines 171-199), `backend/src/services/synthesis.rs` (lines 1191-1220) + - Impact: Architectural consistency, single source of truth for SQL + +6. **Introduce an article accumulator struct** to encapsulate the mutable tracking state in the synthesis pipeline. Currently, `article_scraped`, `source_counts`, `url_source`, `filled_counts`, and `seen_urls` are managed as independent `HashMap`/`HashSet` variables threaded through loops. A `SynthesisAccumulator` struct with methods like `try_add_article(url, category, source) -> bool` would encapsulate invariants (max per category, max per source, deduplication) and simplify the pipeline code. + - Files: `backend/src/services/synthesis.rs` + - Impact: Reduced cognitive complexity, fewer state-management bugs + +7. **Add a circuit breaker for LLM provider failures.** If a provider returns errors for N consecutive calls, stop sending requests and fail fast. The current behavior retries each article independently, wasting time and API quota when the provider is down. + - Files: `backend/src/services/synthesis.rs` + - Impact: Resilience, user experience during outages + +8. **Unify `Vec` to `VecDeque` in `RateLimiter`.** The per-key `RateLimiter` uses `Vec::retain()` (O(n)) while `ProviderRateLimiter` uses `VecDeque::pop_front()` (O(1)). Switch the per-key limiter to `VecDeque` for consistency and performance under high request volumes. + - Files: `backend/src/services/rate_limiter.rs` (lines 37-77) + - Impact: Minor performance improvement, code consistency + +9. **Add periodic cleanup for `user_rate_limiters` in `AppState`.** The `DashMap` grows unboundedly. Add a background task (similar to `JobStore` TTL cleanup) that removes entries for users inactive for more than 24 hours. + - Files: `backend/src/app_state.rs`, `backend/src/main.rs` + - Impact: Prevents slow memory growth in long-running deployments + +10. **Consider making `UpdateSettingsRequest::validate()` return `Result<(), AppError>` instead of `Result<(), String>`.** Currently, handlers must map the string error: `body.validate().map_err(AppError::Validation)`. Having validate return the correct error type directly would be more ergonomic and consistent with the rest of the codebase. + - Files: `backend/src/models/settings.rs` + - Impact: Minor ergonomic improvement, consistency diff --git a/docs/audit/devils-advocate-report.md b/docs/audit/devils-advocate-report.md new file mode 100644 index 0000000..71d344f --- /dev/null +++ b/docs/audit/devils-advocate-report.md @@ -0,0 +1,309 @@ +# Devil's Advocate Audit Report + +**Date**: 2026-03-26 +**Scope**: Full codebase audit of AI Weekly Synth (Rust/Axum + SolidJS) +**Approach**: Adversarial review -- challenging assumptions, finding hidden risks, questioning decisions + +--- + +## Executive Summary + +AI Weekly Synth is a well-structured project with thoughtful security measures. However, it suffers from several uncomfortable truths: + +1. **The master encryption key is reconstructed from hex on every single API key operation** -- it is never cached. This means the raw key string sits in `AppConfig` (a `Clone`-able struct) for the entire process lifetime, defeating the purpose of `zeroize`. +2. **There is no background cleanup for expired sessions or stale jobs.** The sessions table and article_history table will grow unbounded until someone notices. +3. **The generation pipeline has no cancellation or timeout mechanism.** A hung LLM call can block a user's generation slot indefinitely since the per-user "one active job" check has no upper time bound beyond the 1-hour TTL. +4. **Rate limiting is entirely in-memory and per-process.** A restart resets all rate limit state, and horizontal scaling (multiple instances) would have no shared rate limit enforcement. +5. **The SSRF protection has a documented TOCTOU vulnerability** (DNS rebinding) that is acknowledged but not mitigated. + +--- + +## Architectural Concerns + +### Why Rust for This? + +The core business logic -- calling LLM APIs, scraping HTML, shuffling JSON -- is I/O-bound, not CPU-bound. Rust's primary advantage (zero-cost abstractions, memory safety without GC) adds relatively little value here compared to Python/FastAPI or TypeScript/Express, while significantly increasing development velocity cost. + +**Counter-argument the team would give**: "It's a learning project." Fair. But the CLAUDE.md itself says "idiomatic Rust (learning project)" -- which means the architecture was chosen for developer education, not because it's the right tool. This is honest but means the codebase carries cognitive overhead that a maintainer would not expect. + +**Concrete risk**: Contribution and maintenance will be limited to Rust-proficient developers, which is a small pool. If the original author steps away, someone inheriting this codebase will face a steep ramp-up compared to an equivalent Python project. + +### The Two-Phase Pipeline Is Overengineered + +The synthesis pipeline (`synthesis.rs`, ~1300 lines) implements: +- Phase 1: Source scraping with parallel link extraction, history dedup, batch classification +- Phase 2: Web search fallback (LLM or Brave) with its own scraping + classification loop + +This is the same scrape-classify-summarize logic duplicated twice (lines 429-543 and lines 639-757 are nearly identical patterns). The Brave Search path and the LLM Search path also duplicate filtering logic (homepage filter, cross-phase dedup, history dedup, source diversity -- all repeated in three places). + +**Concrete risk**: A bug fix in one path (e.g., the dedup logic) must be manually replicated in all three paths. The recent commit history confirms this is already happening: `fix: filter empty scraped articles + restore URLs after rewrite` touched multiple places. + +### Single-Process In-Memory State + +`JobStore`, `RateLimiter`, `ProviderRateLimiter`, and `user_rate_limiters` are all in-memory `DashMap` structures. This means: +- **No horizontal scaling**: Two instances would each allow one concurrent job per user. +- **State loss on restart**: All active generation jobs are silently lost. +- **No persistence of rate limit state**: A process restart resets all rate limit windows. + +For a "single-tenant self-hosted" app this is acceptable today, but the architecture does not admit a path to multi-instance deployment without significant rework. + +--- + +## Hidden Technical Debt + +### 1. MasterKey Reconstruction on Every Operation + +Every time an API key is encrypted or decrypted, the code calls `MasterKey::from_hex(&state.config.master_encryption_key)`. This means: +- A new `Vec` is allocated, filled with key material, used, then zeroized on drop. +- But `state.config.master_encryption_key` (the hex string) lives in `AppConfig`, which derives `Clone`. Every clone of `AppState` carries a copy of the raw hex key string in memory. The `zeroize` on `MasterKey::drop` protects one copy while leaving the canonical source in plain memory. + +**File**: `backend/src/config.rs` (line 16), `backend/src/services/encryption.rs` + +**Risk scenario**: A memory dump or core dump of the process would reveal `MASTER_ENCRYPTION_KEY` in cleartext, because `AppConfig::master_encryption_key` is a `String` that is never zeroized. The `MasterKey` wrapper's `zeroize` is security theater in this context. + +### 2. No Session Cleanup Scheduler + +The `sessions::delete_expired` function exists (line 99 of `db/sessions.rs`) but is **never called** by any background task or scheduler. Expired sessions accumulate forever. The auth middleware checks expiration on each request (and deletes that one session), but sessions that simply expire without being used again are never cleaned. + +**File**: `backend/src/db/sessions.rs` (line 98-105) + +**Risk scenario**: After a year of operation, the sessions table could have tens of thousands of expired rows. Each auth check does an index lookup, so performance is not the concern -- the concern is data hygiene and compliance (old session metadata like IP addresses and user agents sitting in the database indefinitely). + +### 3. article_history and llm_call_log Unbounded Growth + +The `article_history` table stores a row for **every article encountered** during generation -- not just used articles, but also filtered, dropped, and deduplicated ones. With 5 categories, 15 links per source, and 10+ sources, a single generation run can produce 100+ rows. The `cleanup_old` function (called at the start of each generation) only deletes entries where `synthesis_id IS NULL`, meaning "used" entries are retained forever. + +The `llm_call_log` table stores full prompts and responses (system_prompt, user_prompt, response_body as TEXT). A single generation run with 20 LLM calls storing full prompts and responses could easily be 500KB+ per run. + +**Files**: `backend/migrations/20260324000015_add_article_history.sql`, `backend/migrations/20260324000017_create_llm_call_log.sql` + +**Risk scenario**: A user generating weekly syntheses for a year produces ~52 runs x 100 history rows = 5,200+ article_history rows and ~52 x 20 = 1,040 llm_call_log entries (with full prompt text). For the LLM logs, this could be 25MB+ per user per year. There is no size-based cleanup or archival strategy. + +### 4. LLM API Changes Will Break Silently + +The LLM provider implementations (`gemini.rs`, `openai.rs`, `anthropic.rs`) parse JSON responses from vendor APIs. When these APIs change their response format (which happens regularly), the failure mode is a parse error that surfaces as a generic "Generation failed" error to the user. + +There is no version pinning on the API endpoints (e.g., Gemini uses `v1beta`, OpenAI uses `v1`), and no structured error handling that distinguishes "API changed" from "API key invalid" from "model deprecated." + +**Risk scenario**: Google deprecates a Gemini model identifier. The user's settings still reference it. Generation fails with an opaque error. The user has no way to diagnose the problem from the error message. + +--- + +## Security Assessment + +### Encryption: Sound Implementation, Flawed Key Lifecycle + +The AES-256-GCM implementation itself is textbook correct: +- Random 12-byte nonces via OS randomness +- Proper use of the `aes_gcm` crate +- `zeroize` on `MasterKey` drop +- No nonce reuse (each encryption generates a fresh nonce) + +**But the key lifecycle is the vulnerability**: +- `MASTER_ENCRYPTION_KEY` is stored as a plain `String` in `AppConfig` +- `AppConfig` derives `Clone`, so the key string is duplicated across every `AppState` clone +- The key is passed through environment variables, which may be visible in `/proc//environ` on Linux +- There is no key rotation mechanism. If the key is compromised, every user's API keys must be re-encrypted with a new key -- and there is no tooling for this. + +**File**: `backend/src/config.rs` (line 16), `backend/src/services/encryption.rs` + +### SSRF: Good but Incomplete + +The SSRF protection is well-designed: +- DNS resolution check before connection +- Private IP detection for both IPv4 and IPv6 +- Redirect policy with per-hop IP validation +- Scheme validation (http/https only) + +**Gaps**: +1. **DNS Rebinding (documented)**: The comment on line 62 of `scraper.rs` acknowledges TOCTOU between DNS check and actual TCP connection. An attacker controlling DNS could return a public IP for the check, then rebind to `127.0.0.1` for the actual connection. +2. **IPv4-mapped IPv6 not checked**: Addresses like `::ffff:127.0.0.1` or `::ffff:10.0.0.1` are not explicitly blocked. Depending on the network stack, these could bypass the IPv4 private checks. +3. **Cloud metadata endpoints**: `169.254.169.254` is blocked (link-local), but cloud providers often expose metadata at other addresses (e.g., `fd00:ec2::254` on AWS). The IPv6 ULA range (`fc00::/7`) is blocked, which covers some of these, but not all. + +**File**: `backend/src/services/scraper.rs` (lines 281-316) + +### Session Security: Solid Foundation, Missing Features + +Strengths: +- Session tokens are 32 bytes of OS randomness (256 bits of entropy) +- Only SHA-256 hashes stored in DB (database leak doesn't compromise sessions) +- HttpOnly, SameSite=Lax cookies +- CSRF via `X-Requested-With` header check + +**Gaps**: +1. **No session sliding expiration**: The `last_active_at` column exists but is never updated. Sessions expire at a fixed time regardless of activity. This is a design choice, but it means active users get logged out after 30 days. +2. **No concurrent session limits**: A user can have unlimited active sessions. A stolen session token remains valid for 30 days with no mechanism to detect anomalous session usage. +3. **No session revocation UI**: Users cannot see or revoke their active sessions. +4. **Magic link tokens in GET parameters**: The `GET /api/v1/auth/verify?token=...` endpoint puts the magic link token in the URL query string. This token ends up in browser history, server access logs, and potentially Referer headers. The POST endpoint exists as an alternative, but the email link uses GET. + +**File**: `backend/src/middleware/auth.rs`, `backend/src/handlers/auth.rs` (line 244) + +### Error Message Sanitization: Allowlist Would Be Safer + +The `sanitize_error_message` function (line 1291 of `synthesis.rs`) uses a blocklist approach -- checking for known sensitive patterns. This is inherently fragile: +- What if an error contains `connection refused to internal-host.corp.net:5432`? It would not match any blocklist pattern and would be passed through. +- The truncation at 200 chars still allows significant information leakage. + +A safer approach would be an allowlist of known-safe error patterns, defaulting to a generic message for anything unrecognized. + +--- + +## Operational Risks + +### What Will Wake You at 3am + +#### 1. Database Fills Up (No Monitoring or Alerts) + +The Docker Compose configuration does not set any PostgreSQL `max_connections`, `shared_buffers`, or storage limits. The `postgres_data` volume grows without bound. The `shm_size: 128mb` is set, but there are no alerts on disk usage. + +**Risk scenario**: The `llm_call_log` table (storing full LLM prompts and responses as TEXT) grows past available disk. PostgreSQL starts rejecting writes. All generation jobs fail. The error appears as a generic "Internal error" to users. + +**File**: `docker-compose.yml` + +#### 2. Hung Generation Jobs Block Users + +If an LLM API call hangs (the `reqwest` timeout is 120 seconds, but the overall pipeline has no timeout), the user's generation slot is occupied. The `has_active_job` check prevents starting a new one. The only escape is the 1-hour job TTL, but during that hour the user is stuck. + +There is no admin endpoint to cancel a stuck job, and no watchdog to kill jobs that exceed a reasonable duration. + +**File**: `backend/src/services/synthesis.rs` (line 88, `JOB_TTL`) + +#### 3. Rate Limit Bypass on Restart + +All rate limiting state is in-memory. A server restart (or deployment) resets all rate limit counters. An attacker who discovers this can time their abuse to coincide with deployments. + +The `ProviderRateLimiter::reload_from_db` also resets all timestamps (line 182-191), meaning an admin updating rate limits for any provider resets the sliding window for that provider. + +**File**: `backend/src/services/rate_limiter.rs` (lines 182-191) + +#### 4. No Database Backups Configured + +The Docker Compose configuration provides no backup mechanism for the PostgreSQL volume. There is no `pg_dump` cron job, no WAL archiving, no replication. + +**Risk scenario**: The Docker volume is lost due to host failure. All user data, syntheses, encrypted API keys, and settings are permanently gone. + +--- + +## UX Failure Modes + +### 1. Generation Takes 10+ Minutes + +With 10 sources, 15 links each, batch size 5, and LLM classification per article, the pipeline can make 30+ LLM calls (with rate limiting waits between batches). At 5-10 seconds per LLM call plus rate limit waits, a generation can easily take 5-15 minutes. + +The SSE progress stream provides percentage updates, but the percentages are approximate (hardcoded ranges: 25-65% for Phase 1 processing, 70-90% for Phase 2). A user seeing "Articles 3-5/87..." at 27% for several minutes may assume it is stuck. + +There is no estimated time remaining, no cancel button functionality, and if the browser tab is closed and reopened, the SSE reconnection has only 3 retries with exponential backoff (1s, 2s, 4s) before giving up permanently. + +### 2. All Sources Down + +If all user sources return errors during Phase 1, the pipeline silently moves to Phase 2 (web search fallback). If the user has no web search configured (no Brave key, LLM search disabled), the pipeline reaches the "no valid articles found" error at line 848. The error message ("Aucun article valide trouve. Verifiez vos sources et categories.") does not distinguish between "your sources are all down" and "we found articles but they were all filtered out." + +### 3. Error Messages Are in French Only + +All user-facing error messages in the backend are hardcoded in French. The CLAUDE.md mentions "i18n-ready" with French translations in the frontend, but the backend error messages bypass i18n entirely. A non-French-speaking user would see opaque French error messages. + +**Examples** from `synthesis.rs`: "Limite de requetes atteinte", "Aucun article valide trouve", "Erreur d'authentification avec le fournisseur IA." + +--- + +## Edge Cases & Race Conditions + +### 1. Race Condition in Job Creation + +The `create_job` method (line 107 of `synthesis.rs`) iterates all entries in the `DashMap` to check if the user already has an active job, then inserts a new entry. This is not atomic. Two concurrent requests could both pass the check and create two jobs for the same user. The `trigger_generate` handler has a separate `has_active_job` check before `create_job`, making it doubly racy -- two requests could both pass `has_active_job`, then both call `create_job`. + +**Risk scenario**: User double-clicks the "Generate" button rapidly. Two generation jobs start simultaneously, consuming double the LLM API quota and potentially producing duplicate syntheses. + +**File**: `backend/src/handlers/generation.rs` (lines 43-62) + +### 2. Settings Race with Generation + +A user can update their settings while a generation is in progress. The generation pipeline reads settings once at the start (line 251). If the user changes their provider or model mid-generation, the change does not affect the running job (which is correct), but the user might be confused about which settings were used. + +More problematically, if the user deletes their API key while a generation is in progress, the generation will fail at the next LLM call with a decryption or not-found error. + +### 3. URL Normalization Edge Cases + +The `normalize_article_url` function (line 1069) strips tracking parameters and trailing slashes but does not handle: +- **URL encoding differences**: `https://example.com/caf%C3%A9` vs `https://example.com/cafe` would be treated as different articles. +- **www vs non-www**: `https://www.example.com/article` vs `https://example.com/article` are treated as different URLs. +- **Port normalization**: `https://example.com:443/article` vs `https://example.com/article`. +- **Case sensitivity in paths**: While the final `.to_lowercase()` handles this, some servers treat paths as case-sensitive. + +**File**: `backend/src/services/synthesis.rs` (lines 1069-1107) + +### 4. `sanitize_error_message` Truncation is Byte-Unsafe + +Line 1313: `&msg[..200]` slices at byte position 200. If the message contains multi-byte UTF-8 characters (common in French: accented characters), this will panic at runtime if byte 200 falls in the middle of a multi-byte character. + +**File**: `backend/src/services/synthesis.rs` (line 1313) + +**Risk scenario**: An LLM returns an error message containing accented characters. The error sanitization panics, crashing the generation task. The job is left in a "progress" state forever (until TTL cleanup). + +--- + +## The Uncomfortable Questions + +### 1. What Happens When a User's API Key Is Compromised? + +If a user's LLM API key is compromised (e.g., database breach, even though keys are encrypted), there is no mechanism to: +- Notify the user +- Force re-encryption with a new master key +- Audit which keys were decrypted and when +- Rotate the master encryption key without downtime + +### 2. Is the Per-Article LLM Classification Sustainable? + +Each article in the pipeline requires an individual LLM call for classification and summarization. With 20 categories x 4 items = 80 target articles, and typical scrape success rates around 50%, the pipeline may attempt 160+ LLM calls per generation. At $0.01-0.10 per call, a single generation could cost $1.60-$16.00. Is the user aware of this cost? + +### 3. Why Is `SESSION_SECRET` Required but Never Used? + +The `SESSION_SECRET` environment variable is validated at startup (must be 64+ characters) but is never referenced in the actual session creation or verification code. Sessions use cryptographically random tokens with SHA-256 hashing -- the session secret is not used as an HMAC key or for any other purpose. It appears to be a vestigial requirement. + +**File**: `backend/src/config.rs` (line 15), search for `session_secret` usage outside config shows no references in auth or session code. + +### 4. What If the Article History Uniqueness Assumption Is Wrong? + +The migration `20260324000016_enrich_article_history.sql` drops the unique index on `(user_id, url_hash)` and replaces it with a non-unique index. This means the same URL can appear multiple times in the history table (e.g., once as "filtered_history", once as "used"). The `check_urls_exist` function returns a `HashSet` of matching hashes, so duplicates don't cause functional issues, but the table grows faster than expected and the ON CONFLICT DO NOTHING in `insert_urls` (which still exists) will never trigger since there is no unique constraint. + +### 5. Is the "Single-Tenant" Assumption Actually Enforced? + +The CLAUDE.md says "Single-tenant self-hosted (one instance per deployment)" but nothing in the code enforces this. Multiple users can register, each with their own settings and API keys. The rate limiting is global per-provider (not per-user by default). If two users generate simultaneously using the same LLM provider, they share the same rate limit bucket -- one user's generation can starve the other. + +--- + +## Prioritized Risk Mitigation + +### P0 -- Fix Before Next Release + +1. **UTF-8 truncation panic in `sanitize_error_message`** (line 1313 of `synthesis.rs`): Replace `&msg[..200]` with a char-boundary-safe truncation. This is a runtime crash waiting to happen. + +2. **Race condition in job creation**: Make `has_active_job` + `create_job` atomic by using a single `DashMap` operation or a mutex. Otherwise double-generation is possible on double-click. + +### P1 -- Fix Within 30 Days + +3. **Add expired session cleanup**: Either a background tokio task that runs `sessions::delete_expired` periodically (e.g., every hour), or a database-level scheduled function. The function exists but is never called. + +4. **Add generation pipeline timeout**: Wrap the `run_generation_inner` call with `tokio::time::timeout(Duration::from_secs(600), ...)` to ensure no job runs indefinitely. + +5. **Add database backup strategy**: At minimum, add a `pg_dump` cron job or document a backup procedure. The current setup has zero data redundancy. + +### P2 -- Fix Within 90 Days + +6. **Stop storing master_encryption_key as a Clone-able String**: Either store it in an `Arc` in `AppState` (created once at startup) instead of reconstructing it per-operation, or at minimum stop deriving `Clone` on `AppConfig` and use `Arc` instead. + +7. **Add database growth monitoring**: Track table sizes for `article_history`, `llm_call_log`, and `syntheses`. Add cleanup policies or admin-facing growth dashboards. + +8. **Implement admin job management**: Add an admin endpoint to list active jobs and cancel stuck ones. + +9. **Address IPv4-mapped IPv6 in SSRF checks**: Add explicit checks for `::ffff:` prefixed private IPv4 addresses. + +### P3 -- Address Eventually + +10. **Deduplicate pipeline logic**: Extract the scrape-classify-summarize-filter pattern into a shared function called by both Phase 1, Phase 2 Brave, and Phase 2 LLM paths. + +11. **Add master key rotation tooling**: A CLI command that decrypts all keys with the old master key and re-encrypts with a new one. + +12. **Evaluate whether `SESSION_SECRET` is actually needed**: If it serves no purpose, remove it to reduce configuration burden. If it was intended for HMAC-signed sessions, either implement that or document why the current approach is preferred. + +13. **Add user-facing cost estimation**: Before starting generation, estimate the number of LLM calls and display an approximate cost based on the selected provider's pricing. diff --git a/docs/audit/frontend-report.md b/docs/audit/frontend-report.md new file mode 100644 index 0000000..dcf4489 --- /dev/null +++ b/docs/audit/frontend-report.md @@ -0,0 +1,279 @@ +# Frontend Audit Report (SolidJS) + +**Date:** 2026-03-26 +**Scope:** `frontend/src/` -- SolidJS + Tailwind CSS v4 application +**Files reviewed:** 30+ files across pages, components, API modules, utils, i18n, and types + +--- + +## Executive Summary + +The frontend is well-structured for a learning project of this scope. It demonstrates solid understanding of SolidJS fundamentals: lazy loading, proper `Show`/`For` usage, context-based auth, typed i18n, and a clean API client layer. The codebase is consistent in style and convention. + +That said, several structural issues deserve attention. The Settings page has grown to 1026 lines and mixes concerns that should be decomposed. Some `setTimeout` calls lack `onCleanup` guards, creating potential memory leaks. A handful of SolidJS-specific anti-patterns (destructured reactive props, `children: any`) reduce type safety and could cause subtle bugs. The i18n system, while type-safe at the key level, has an XSS surface via `innerHTML` usage in one location. None of these are critical in a single-tenant self-hosted deployment, but they represent the most impactful areas for improvement. + +**Overall quality: Good.** The top priorities are decomposing the Settings page, adding missing `onCleanup` guards, and fixing the `innerHTML` usage. + +--- + +## SolidJS Patterns + +### Reactive Primitives + +**Signals and derived state -- well done.** The AuthContext correctly models derived signals (`isAuthenticated`, `isAdmin`) as plain functions over `user()`, avoiding redundant signals. The `createResource` usage in ApiKeyManager and Settings for provider/key data is appropriate. + +**`createEffect` usage -- mostly correct, one concern.** Effects in GenerateSynthesis for auto-redirect and error handling are clean. However, the effect in Settings.tsx (line 81-93) that checks provider availability reads both `providers()` and `settings()`, which means it will re-fire whenever either signal changes -- including when the user changes an unrelated setting field. This is harmless (idempotent) but wasteful. Using `on()` to track only `providers` would be more precise: + +```ts +// Current: re-fires on any settings change +createEffect(() => { + const providerList = providers(); + const currentSettings = settings(); // unintended dependency + ... +}); + +// Better: only fires when providers load +createEffect(on(() => providers(), (providerList) => { ... })); +``` + +**`createResource` vs `onMount` inconsistency.** Some data fetching uses `createResource` (ApiKeyManager, Settings providers), while most pages use `onMount` + manual signal (Home, Sources, LlmLogs, GenerateSynthesis). This is not wrong, but `createResource` would provide free loading/error states and refetch capabilities. The inconsistency suggests no team-wide convention was established. + +### Component Composition + +**`Show` and `For` -- correctly used throughout.** The callback form of `Show` (e.g., `{(currentUser) => ...}`) is used in Navbar and MobileMenu, which is the correct SolidJS pattern for narrowing nullable values. `For` is used for all list rendering, never `.map()`. + +**Step status rendering in GenerateSynthesis -- could use `Switch/Match`.** Lines 335-357 use three consecutive `` blocks for the three possible step statuses. This is semantically an exclusive match and would be clearer as: + +```tsx + + + + + + + + + + + +``` + +The current approach renders correctly because only one condition is true at a time, but `Switch/Match` makes the mutual exclusion explicit and avoids evaluating all three conditions. + +### Performance Concerns + +**`completedSteps()` in GenerateSynthesis is computationally expensive per call.** This derived function (lines 126-160) iterates over all events and performs multiple `findIndex` calls on every render triggered by new SSE events. For the typical 4-step pipeline this is negligible, but the algorithmic complexity is O(steps * events). If the event stream grew large, this would become visible. Consider memoizing with `createMemo`. + +**`For` with reactive lookup in ApiKeyManager.** The `getKeyForProvider` function (line 33) scans the `apiKeys()` resource on every call. Since it is called inside the `For` loop's render function, it re-evaluates whenever `apiKeys` changes, which is correct. However, the returned value is not a signal -- it is a plain value computed from reactive data accessed during render. This works in SolidJS because `For` tracks the parent signal, but it means the entire provider list re-renders when any key changes. For 3 providers this is negligible. + +**No unnecessary re-renders detected.** The project correctly avoids the React-style anti-pattern of destructuring signals at the component top level. Signal reads are deferred to JSX expressions (e.g., `{settings().theme}` rather than `const theme = settings().theme` outside JSX). + +--- + +## Component Architecture + +### Oversized Components + +**Settings.tsx (1026 lines) is the primary concern.** This single component handles: +- Settings form (theme, age, items, articles, batch size, history days) +- Provider selection with auto-detection and warning +- Brave Search key management (save, test, delete, toggle) +- Search agent behavior textarea +- Category list with add/remove +- Rate limit configuration +- JSON export/import with optional API key inclusion +- API key manager delegation + +This should be decomposed into at least 5 sub-components: +1. `SettingsForm` -- basic numeric/text fields +2. `ProviderSelector` -- provider dropdown, model dropdowns, web search badge +3. `BraveSearchSection` -- key management + toggle (this is essentially a duplicate of ProviderKeyCard logic) +4. `CategoryManager` -- category list CRUD +5. `RateLimitSection` -- rate limit inputs and reset + +The Brave Search key management (lines 57-63, 134-179, 582-678) duplicates the same save/test/delete pattern already implemented in `ProviderKeyCard` within `ApiKeyManager.tsx`. This is a clear DRY violation -- Brave Search should be treated as another provider key card. + +**Other pages are appropriately sized.** Home (240 lines), GenerateSynthesis (404 lines), Sources (~300 lines), and LlmLogs (159 lines) are all reasonable for their complexity. + +### State Management + +**Auth context is clean and minimal.** It provides the right abstraction surface (`user`, `loading`, `isAuthenticated`, `isAdmin`, `logout`, `refreshUser`). The `refreshUser` function is available but appears unused in the current codebase -- verify if this is needed. + +**Toast context is well-implemented.** The module-level `activeTimers` Map for managing auto-dismiss timers is pragmatic. The `aria-live="polite"` on the toast container is a good accessibility touch. + +**No global state beyond auth and toast.** Settings and other data are fetched per-page via `onMount`. This is appropriate for the application's complexity level -- introducing a global store would be premature. + +**The `deleteTimers` signal in Home.tsx stores `setTimeout` handles in a reactive signal.** This works but is unusual. The timers are stored so they can be cleared on the second (confirming) click. However, these timers are never cleaned up on component unmount -- if the user navigates away while a delete confirmation is pending, the timeout callback will still fire and call `setDeletingId` on an unmounted component. This should use `onCleanup`. + +### Reusability + +**`Button` component exists but is rarely used.** A well-typed `Button` component with variant support (`primary`, `secondary`, `danger`), loading state, and icon slot lives in `components/ui/Button.tsx`. However, across the pages audited, buttons are overwhelmingly created with inline Tailwind classes rather than using this component. This leads to significant class string duplication. For example, the indigo primary button pattern appears in at least 12 places with minor variations. + +**`LoadingSpinner` is consistently reused** across all pages -- good. + +**`ProviderKeyCard` is a good reusable pattern** but its logic is duplicated for Brave Search in Settings.tsx as noted above. + +**No shared form input components.** Every input field repeats the full Tailwind class string: `"shadow-sm focus:ring-indigo-500 focus:border-indigo-500 block w-full sm:text-sm border-gray-300 rounded-md py-2 px-3 border"`. A `TextInput`, `NumberInput`, and `Select` component would reduce duplication and ensure visual consistency. + +--- + +## Type Safety + +**Overall: strong.** The codebase makes effective use of TypeScript across all layers. + +**Strengths:** +- All API request/response types are defined in `types.ts` with proper interfaces +- The `isApiError` type guard is used consistently for error handling +- The i18n system uses `TranslationKey` (derived from `keyof typeof fr`) to make translation keys type-checked at compile time +- `satisfies` is used in the API client for throw expressions (line 60 of client.ts) and API calls (syntheses.ts line 86) +- Props interfaces are defined for all components + +**Issues:** + +1. **`children: any` in Layout and AdminLayout.** Both layout components type their children prop as `any` instead of using `ParentComponent` or `JSX.Element`. This loses type safety on what can be passed as children. The fix is trivial: use `ParentComponent` as done in App.tsx's `ProtectedLayout`. + +2. **Non-null assertion in ApiKeyManager.** Line 249 uses `props.apiKey!.key_prefix` inside a `` block. While logically safe (the Show guard ensures the value exists), the compiler cannot verify this. Using `Show`'s callback form (`{(key) => key().key_prefix}`) would eliminate the assertion. + +3. **`ApiClient.request` returns `undefined as T` for 204 responses (line 76 of client.ts).** This is a type lie -- the caller may not expect `undefined`. The methods that return `void` (like `delete`) work correctly because `undefined` satisfies `void`, but if a caller expected a non-void return from a 204, it would fail at runtime. Consider constraining 204-returning methods to `Promise` at the call site. + +4. **`isApiError` type guard is loose.** It checks for the presence of `status` and `message` properties but does not verify their types. Any object with these two keys would pass. Adding `typeof error.status === 'number'` would tighten it. + +5. **`DEFAULT_SETTINGS` contains hardcoded French text** (category names, search behavior). This is a content/i18n boundary violation -- if the app were ever localized, these defaults would need to come from the i18n system. + +--- + +## Code Organization + +### File Structure -- Good + +The project follows a clear convention: +- `pages/` for route-level components +- `components/` for shared UI +- `api/` for HTTP communication +- `utils/` for pure helpers +- `contexts/` for global state +- `i18n/` for translations +- `types.ts` as the single type definition file + +This separation is clean and navigable. + +### Import Patterns -- Consistent + +All imports use the `~/` path alias (configured in vite.config.ts), avoiding fragile relative paths. Imports are grouped logically (solid-js, router, icons, local modules) though not enforced by a linter rule. + +### Separation of Concerns -- Mostly Good + +**API layer is well-isolated.** The `client.ts` singleton handles all HTTP concerns (CSRF headers, credentials, 401 redirect, error parsing). Individual API modules (`auth.ts`, `settings.ts`, etc.) are thin wrappers that only map routes to typed calls. This is excellent. + +**Business logic leaks into pages.** The Settings page contains import/export logic (JSON serialization, file download, file parsing), Brave Search key management, and category validation. These should be extracted into utility functions or a settings service module. + +**The `syntheses.ts` API module contains `triggerDownload` and `fetchFile` utility functions** that are not synthesis-specific. These should live in a shared `utils/download.ts` module so other features (e.g., CSV export in Sources) could reuse them. + +### Missing Error Boundaries Per-Route + +The application has a single top-level `ErrorBoundary` wrapping the entire router. An error in any page will replace the entire app with the error screen. Consider wrapping each route's content with a more localized error boundary that preserves the navigation bar. + +--- + +## UI/UX Patterns + +### Error Handling -- Consistent But Could Be Unified + +Every page implements its own error display pattern with slight variations: +- Home: red banner with `border border-red-200` +- GenerateSynthesis: red banner with `border-l-4 border-red-400` +- LlmLogs: red box with `border border-red-200` + +A shared `ErrorAlert` component would standardize these. + +Error handling across the codebase consistently uses the `isApiError` type guard to surface server messages or fall back to i18n strings. This is a good pattern. + +**`console.error` usage in Sources.tsx** (lines 103, 174) and ArticleHistory.tsx (line 104) should be replaced with user-visible error feedback (toast or inline message). Errors silently logged to the console are invisible to the user. + +### Loading States -- Good + +All pages that fetch data show `` during loading via `}>`. The spinner supports `fullPage` and `size` variants. + +### Form Handling + +**No form validation library.** Settings and Sources pages implement validation inline. This is acceptable at current scale but becomes harder to maintain as forms grow. The Settings page in particular has no client-side validation -- invalid values (e.g., negative `max_age_days`) are sent to the server. + +**Controlled inputs use `value` + `onInput`.** This is the correct SolidJS pattern (not `onChange` which fires on blur in some browsers). Numeric inputs use `parseInt(...) || defaultValue` for fallback, which is pragmatic but means a user typing "0" gets the default instead. + +**The import flow in Settings does not auto-save.** After importing a JSON file, the user sees a success message saying "N'oubliez pas d'enregistrer" (don't forget to save). This is a potential UX pitfall -- users may import and navigate away, losing changes. + +### Accessibility + +**Good:** +- `aria-label` on icon-only buttons (mobile menu toggle, show/hide key) +- `aria-live="polite"` on toast container +- `aria-hidden="true"` on decorative icons +- `role="alert"` on toast items +- Semantic HTML (`