ai_synth/docs/audit/v2-architect-report.md

# AI Weekly Synth -- Architecture Audit Report (v2)

**Date**: 2026-03-27
**Scope**: Full backend codebase (Rust/Axum), key frontend architecture observations
**Auditor**: Software Architect (automated)

---

## Executive Summary

AI Weekly Synth is a well-structured Rust/Axum application that has grown substantially from its initial design. The codebase demonstrates strong fundamentals: consistent error handling, good security practices, clean layer separation between handlers/services/db, and idiomatic use of Axum extractors. Unit test coverage is solid across models, services, and middleware.

However, the growth -- particularly the addition of multi-theme synthesis, scheduled generation, Brave Search, windowed source extraction, and article history tracing -- has introduced several architectural tensions. The synthesis pipeline (`synthesis.rs`) has become a 1500+ line monolith carrying at least five distinct responsibilities. The scheduler bypasses the job store abstraction. And several cross-cutting concerns (provider resolution, rate limiting, history tracking) are tightly coupled to concrete implementations, making the system harder to test, extend, and reason about.

This report organizes findings by SOLID principles, design patterns, architecture, and dependency management, then closes with prioritized recommendations.

---

## 1. SOLID Principles

### 1.1 Single Responsibility Principle (SRP)

**Critical: `synthesis.rs` is a God Module**

At 1500+ lines, `services/synthesis.rs` carries at least six distinct responsibilities:

| Responsibility | Lines (approx.) | Should be |
|---|---|---|
| Job store (in-memory concurrent map) | 1-193 | Own module `services/job_store.rs` |
| Progress event types + emission | 37-71, 1063-1071 | Own module or part of job store |
| Pipeline orchestration (phases 1 + 2 + save) | 200-1038 | `services/pipeline.rs` or `services/generation/mod.rs` |
| Article scraping + classification logic | 471-616, 700-880 | `services/article_processor.rs` |
| URL filtering, normalization, hashing | 1255-1339 | `services/url_utils.rs` |
| Provider/model/key resolution | 1342-1446 | `services/provider_resolver.rs` |

The `run_generation_inner` function alone is ~840 lines. It manages five `HashMap`/`HashSet` tracking structures, two nested loop levels (waves and batches), two separate pipeline phases (personalized sources and web search fallback), and three code paths in Phase 2 (Brave Search, LLM search, skip). This makes the function extremely difficult to test in isolation, review for correctness, or extend with new pipeline stages.

**Moderate: `scheduler.rs` duplicates pipeline invocation logic**

The scheduler constructs its own `watch::channel` and `AtomicBool`, calls `run_generation_inner` directly, and handles email sending inline. It bypasses the `JobStore` entirely, which means:
- Scheduled jobs are invisible to the SSE progress API
- The one-job-per-user guard does not apply (it only checks `job_store.has_active_job`)
- Email sending logic (fetch synthesis, iterate recipients, call `email::send_synthesis_email`) is duplicated -- the handler version is in `syntheses.rs` handler

**Moderate: `AppState` accumulates responsibilities**

`AppState` holds configuration, database pool, HTTP client, auth rate limiter, provider rate limiter, per-user rate limiters, and the job store. While `Clone`-cheap (all Arc-based), it acts as a service locator, making it unclear which components a given handler actually depends on. With 8 fields, this is approaching the point where injecting specific dependencies would improve clarity.

**Minor: Handler-level response types**

Some response types are defined in handlers (`AdminUserResponse` in `handlers/admin.rs`, `GenerateResponse` in `handlers/generation.rs`, `ListResponse` in `handlers/syntheses.rs`) while others are in models. This inconsistency is minor but creates ambiguity about where to look for types.

### 1.2 Open/Closed Principle (OCP)

**Well-applied: LLM Provider abstraction**

The `LlmProvider` trait + factory pattern is the cleanest abstraction in the codebase. Adding a new provider (e.g., Mistral) requires:
1. A new module implementing `LlmProvider`
2. A new match arm in `factory.rs`
3. No changes to the pipeline

This is textbook OCP.

**Violation: Pipeline Phase 2 branching**

Phase 2 of the pipeline has a hard-coded `if settings.use_brave_search { ... } else { ... }` branch that selects between two entirely different code paths (Brave Search vs. LLM web search). Each path contains ~150 lines of nearly identical scrape+classify logic. Adding a third search strategy (e.g., Bing, Perplexity, SearXNG) would require another `else if` branch with the same duplicated scrape/classify logic.

**Violation: Provider resolution fallback defaults**

`resolve_model` contains hard-coded fallback model names (`"gemini-2.5-pro"`, `"gpt-4o"`, `"claude-sonnet-4-20250514"`). These will silently become stale as providers release new models. The fallback chain should be configurable or fail loudly.

### 1.3 Liskov Substitution Principle (LSP)

**Generally well-respected.** The `LlmProvider` trait implementations are fully substitutable. The `MockLlmProvider` correctly implements the same interface and is used in tests via the `provider_override` parameter.

**Minor concern**: The mock provider identifies call types by inspecting the system prompt content (`sys_lower.contains("classer")`, `sys_lower.contains("precis")`). This couples the mock to the French-language prompt wording, making it fragile if prompts change.

### 1.4 Interface Segregation Principle (ISP)

**Well-applied: Axum extractors**

`AuthUser` and `AdminUser` are clean, focused extractors. Handlers declare exactly the auth level they need. The `AdminUser` wrapper pattern (newtype over `AuthUser`) is idiomatic and minimal.

**Opportunity: `LlmProvider` trait could be narrower**

The current trait has two methods: `provider_id()` and `call_llm()`. If the codebase later needs streaming, embedding, or tool-calling capabilities, the trait should be split rather than extended, per ISP.

### 1.5 Dependency Inversion Principle (DIP)

**Critical: Pipeline depends on concrete implementations**

`run_generation_inner` directly calls:
- `db::settings::get_or_create_default` (concrete DB queries)
- `db::themes::get_by_id` (concrete DB queries)
- `db::sources::list_for_user` (concrete DB queries)
- `db::article_history::check_urls_exist` (concrete DB queries)
- `db::article_history::batch_insert_entries` (concrete DB queries)
- `crate::services::scraper::scrape_url` (concrete HTTP scraping)
- `source_scraper::extract_article_links` (concrete link extraction)
- `crate::services::brave_search::search` (concrete Brave API)
- `encryption::decrypt` / `encryption::MasterKey::from_hex` (concrete crypto)

None of these are injected as traits. The `provider_override` parameter for `LlmProvider` is the only dependency that can be swapped -- and it was added specifically for testing. This makes the pipeline untestable without a live Postgres database and network access.

**Moderate: `ProviderRateLimiter` embeds its own SQL**

The `ProviderRateLimiter::reload_from_db` method contains raw `sqlx::query_as` calls rather than going through the `db::rate_limits` module. The comment says "to avoid circular dependency," but this violates the layer boundary and duplicates the DB schema knowledge.

---

## 2. Design Patterns

### 2.1 Well-Applied Patterns

| Pattern | Where | Assessment |
|---|---|---|
| **Factory Method** | `llm/factory.rs` | Clean, tested, extensible |
| **Strategy** | `LlmProvider` trait | Proper polymorphism via `Arc<dyn LlmProvider>` |
| **Observer** | `watch::channel` for SSE | Elegant use of tokio primitives; late subscribers get latest state |
| **Repository** | `db/` modules | Clean separation of SQL from business logic |
| **Extractor** | `AuthUser`, `AdminUser` | Idiomatic Axum; composable auth |
| **Builder** | `AppState::new`, `build_scraper_client` | Consistent construction patterns |
| **Newtype** | `AdminUser(AuthUser)`, `MasterKey` | Type safety for authorization and crypto |

### 2.2 Missing or Needed Patterns

**Pipeline / Chain of Responsibility**

The synthesis generation is conceptually a pipeline with discrete stages:
1. Load settings + theme + sources
2. Phase 1: Source extraction (windowed)
3. Phase 1: Scrape + classify (batched)
4. Phase 2: Web search fallback (Brave or LLM)
5. Phase 2: Scrape + classify fallback results
6. Assemble sections + save

Each stage could be a separate struct implementing a `PipelineStage` trait, with shared context passed through. This would make the pipeline testable per-stage, enable adding/removing stages declaratively, and reduce `run_generation_inner` from 840 lines to ~50.

**Unit of Work / Transaction Manager**

Article history tracing uses a manual `pending_traces` buffer that is flushed at "logical boundaries." This ad-hoc batching is scattered across 7 locations in the pipeline. A dedicated `TraceBatcher` struct could encapsulate the buffer, auto-flush thresholds, and error handling.

**Event Sourcing (lightweight)**

The `ProgressEvent` enum is close to an event-sourced model but is currently fire-and-forget via `watch::channel` (which only retains the latest value). If the system needs progress history for debugging or UI replay, the events should be collected in a log alongside the watch channel.

### 2.3 Anti-Patterns

**Copy-Paste Programming (Critical)**

The scrape+classify logic appears nearly identically in three places:
1. Phase 1 source processing (lines ~470-616)
2. Phase 2 Brave Search processing (lines ~700-880)
3. Phase 2 LLM search validation (lines ~936-960, simpler variant)

Each instance: spawns a `JoinSet` for scraping, collects results, checks rate limits, spawns a `JoinSet` for classification, parses LLM responses, checks `is_article`, extracts dates, assigns categories, updates tracking maps. The only differences are: which `source_type` string is recorded and whether `source_url` is `Some`.

A single `scrape_and_classify_batch` function parameterized by source type would eliminate ~300 lines of duplication.

**Primitive Obsession**

The pipeline uses six `HashMap`/`HashSet` variables (`article_scraped`, `source_counts`, `url_source`, `filled_counts`, `seen_urls`, and the `pending_traces` vec) as raw tracking state. These represent a coherent concept -- "pipeline context" or "generation state" -- and should be bundled into a struct:

```rust
struct GenerationContext {
    articles_by_category: HashMap<String, Vec<NewsItem>>,
    source_domain_counts: HashMap<String, usize>,
    url_to_source: HashMap<String, String>,
    category_fill_counts: HashMap<String, usize>,
    seen_urls: HashSet<String>,
    pending_traces: Vec<ArticleHistoryEntry>,
}
```

**Magic Strings**

Category keys like `"category_0"`, `"category_autre"`, `"category_no_date"`, and status strings like `"filtered_history"`, `"filtered_diversity"`, `"filtered_not_article"`, `"filtered_too_old"`, `"filtered_empty"`, `"filtered_homepage"`, `"filtered_cross_phase_dedup"`, `"used"` are scattered as string literals. These should be constants or an enum.

**Stringly-Typed Configuration**

Several settings use strings where enums would be safer:
- `settings.ai_provider` (could be `enum Provider { Gemini, OpenAi, Anthropic }`)
- `settings.search_agent_behavior` (free-form, but could at least validate non-HTML)
- `synthesis.status` (always `"completed"` in the codebase, but stored as `String`)

---

## 3. Architecture

### 3.1 Layer Separation

The codebase follows a three-layer architecture:

```
handlers/ (HTTP layer) --> services/ (business logic) --> db/ (data access)
                            |
                          models/ (shared types)
```

**Assessment: Good but with leaks.**

- Handlers are thin and focused. They validate input, call services/db, and format responses. This is excellent.
- The `db/` layer is clean -- pure SQL queries returning typed results. No business logic leaks into SQL.
- The `services/` layer is where responsibilities blur. `synthesis.rs` calls `db::` modules directly (bypassing any service abstraction), constructs its own SQL in `resolve_model`, and embeds scraping/classification logic that could be separate services.

**Concern: The scheduler sits at the service layer but orchestrates at the handler level**

The scheduler calls `synthesis::run_generation_inner` (a service) but also does email sending (another service), DB fetching (data layer), and schedule marking (data layer) all inline. It should either be a handler (if it needs to compose services) or delegate to a higher-level "generation + notification" service.

### 3.2 Error Handling

**Strengths:**
- Unified `AppError` enum with `IntoResponse` -- all errors produce consistent JSON
- Internal errors log full details but return generic messages to clients (security-conscious)
- `From<sqlx::Error>` and `From<anyhow::Error>` conversions are clean
- Error sanitization in `sanitize_error_message` prevents API key leakage

**Weaknesses:**
- Errors are silently swallowed in multiple places via `.ok()`:
  - `db::article_history::batch_insert_entries(...).await.ok()` (7 occurrences) -- if tracing fails, there is no indication
  - `db::llm_call_log::truncate_old(...).await.ok()` -- cleanup failure is invisible
  - `db::schedules::mark_run(...).await.ok()` -- if this fails, the schedule may fire again next minute
- `unwrap_or_default()` on `serde_json::from_value` calls silently drops malformed data (e.g., `theme.categories` deserialization). A warning log would be more appropriate.

### 3.3 State Management

**In-memory state:**
- `JobStore` (DashMap-based) -- well-designed with TTL cleanup, user locking, and cancellation support
- `RateLimiter` / `ProviderRateLimiter` -- properly `Arc`-wrapped for `Clone`-cheap sharing
- `user_rate_limiters: DashMap<Uuid, UserRateLimitEntry>` -- handles setting changes atomically

**Concern: No persistence for job state**

If the server restarts during a generation, the in-memory job is lost with no way to recover. For a self-hosted single-instance app this may be acceptable, but if resilience is a goal, the job state should be backed by the database.

**Concern: Scheduled job email state is fire-and-forget**

The scheduler sends emails to up to 3 recipients per schedule. If one email fails, the others still send, but there is no retry or notification mechanism. `mark_run` is called unconditionally after a successful generation, even if all emails failed.

### 3.4 Concurrency Model

**Strengths:**
- Proper use of `tokio::task::JoinSet` for parallel scraping and classification
- `DashMap` + `DashSet` for lock-free concurrent access to shared state
- `AtomicBool` for cooperative cancellation (avoids mutex overhead)
- `watch::channel` for fan-out progress notifications

**Weakness: Global rate limiter shared across scheduled + manual jobs**

The `ProviderRateLimiter` is global. A scheduled job and a manual job for different users hitting the same provider share the same bucket. Under load, scheduled jobs could starve manual users (or vice versa). The architecture should consider per-user-or-per-job rate tracking for fairness.

### 3.5 Security Architecture

**Strengths:**
- AES-256-GCM encryption for API keys at rest with per-key nonces
- SSRF prevention in both `scraper.rs` and `source_scraper.rs` (IP allowlist checking, redirect validation)
- CSRF protection via `X-Requested-With` header check on all mutating API endpoints
- Session cookies are `HttpOnly`, `SameSite=Lax`, optionally `Secure`
- Anti-enumeration in auth (same response for existent/non-existent emails, timing attack mitigation)
- Error sanitization prevents API key leakage in SSE error events
- CSP, X-Frame-Options, HSTS, Referrer-Policy headers

**Concern: Gemini API key in URL**

The `GeminiProvider` constructs the API URL as `...?key={api_key}`. While the error handler carefully avoids logging the full URL, the key is still in the URL query string. This means:
- It may appear in HTTP access logs on intermediary proxies
- `reqwest` may include it in error messages despite the `kind`-only logging
- If tracing is set to DEBUG level, the URL may be logged by tower-http's `TraceLayer`

This is a known Gemini API design limitation, but the risk should be documented.

---

## 4. Dependency Management and Testability

### 4.1 Test Architecture

**Strengths:**
- Unit tests for all model validation logic (settings, theme, schedule, source, synthesis)
- Unit tests for error handling, rate limiting, URL normalization, link extraction
- Mock LLM provider enables end-to-end pipeline testing without real API calls
- Factory tests verify correct provider instantiation

**Weaknesses:**
- The core pipeline (`run_generation_inner`) cannot be unit-tested. It requires:
  - A live `PgPool` (for all `db::` calls)
  - A real `AppState` (for config, rate limiters, job store)
  - Network access (for scraping via `http_client`)
  - Only the LLM provider can be mocked (via `provider_override`)

- No integration tests for the scheduler
- No tests for the Brave Search integration path
- No tests for Phase 2 (web search fallback) at all

### 4.2 Dependency Injection

The codebase uses Axum's `State(AppState)` extractor as its sole DI mechanism. This works well for handlers but breaks down for services:

- Services receive `&AppState` directly, gaining access to everything
- There is no trait boundary between the pipeline and its dependencies (db, scraper, search)
- The `provider_override: Option<Arc<dyn LlmProvider>>` parameter proves the value of DI -- it is the only seam that enables testing

**Recommendation**: Introduce a `PipelineDeps` trait (or struct of trait objects) that the pipeline receives, encapsulating:
- Database access (settings, sources, themes, article history)
- Scraping (source page scraping, article scraping)
- Search (Brave, LLM web search)
- Rate limiting
- Key resolution

This would allow the entire pipeline to be tested with in-memory fakes.

### 4.3 Module Coupling

The dependency graph is mostly clean:

```
handlers --> services --> db
                |          |
              models <-----+
                |
              errors
```

Exceptions:
- `synthesis.rs` calls `db::` directly (bypasses service layer for settings, themes, sources, history, llm_call_log, syntheses)
- `synthesis.rs` calls `crate::services::prompts`, `crate::services::llm::schema`, `crate::services::scraper`, `crate::services::source_scraper`, `crate::services::brave_search`, `crate::services::encryption` -- essentially importing the entire services layer
- `rate_limiter.rs` contains its own SQL queries
- `handlers/syntheses.rs::list` constructs a `Synthesis` model manually from the `SynthesisWithThemeName` join row, duplicating field mapping

---

## 5. Specific Code-Level Findings

### 5.1 The `#[allow(clippy::too_many_arguments)]` Smell

Two functions suppress this lint:
- `build_search_prompt` (9 parameters)
- `log_llm_call` (10 parameters)

Both are symptoms of missing context objects. `build_search_prompt` should take a `SearchPromptConfig` struct. `log_llm_call` should take a `LlmCallRecord` struct.

### 5.2 `run_generation_inner` Parameter List

The function takes 7 parameters: `job_id`, `state`, `user_id`, `theme_id`, `tx`, `provider_override`, `cancelled`. The first four are "what to generate," the last three are "infrastructure." A `GenerationJob` struct and a `PipelineInfra` struct would make intent clearer.

### 5.3 Inconsistent `serde_json::Value` vs. Typed Models

`theme.categories` is stored as `serde_json::Value` and deserialized inline with `serde_json::from_value(...).unwrap_or_default()`. This pattern appears at least 5 times across the codebase (themes, schedules, syntheses). Consider using `sqlx::types::Json<Vec<String>>` for typed extraction at the query level.

### 5.4 French/English Mixing

User-facing messages are consistently in French (good for i18n consistency), but code-level strings mix languages:
- Status strings: `"filtered_history"` (English), `"category_autre"` (French), `"Articles sans date"` (French)
- Log messages: English
- Error messages to users: French

The recommendation is to keep all internal identifiers in English and use the i18n layer for user-facing strings.

---

## 6. Prioritized Recommendations

### P0 -- High Impact, Do First

1. **Extract `GenerationContext` struct** from the 6 tracking variables in `run_generation_inner`. This is a safe refactor that immediately improves readability and reduces parameter passing.

2. **Extract `scrape_and_classify_batch` function** to eliminate the 300-line duplication between Phase 1, Brave Search, and LLM search paths. Parameterize by `source_type: &str` and `source_url: Option<&str>`.

3. **Move `JobStore` to its own module** (`services/job_store.rs`). It is already self-contained with no dependencies on synthesis logic. This reduces `synthesis.rs` by ~190 lines.

### P1 -- Important, Do Soon

4. **Introduce `TraceBatcher`** struct to encapsulate the pending traces buffer, batch insert calls, and error handling. Replace the 7 manual flush sites with `batcher.flush()`.

5. **Make the scheduler use `JobStore`** for scheduled jobs. This provides visibility into scheduled generation progress and prevents race conditions between scheduled and manual jobs. Add email sending as a post-completion hook rather than inline in the scheduler.

6. **Replace magic strings with constants or enums** for article statuses (`"filtered_history"`, etc.), category keys (`"category_0"`, `"category_autre"`, `"category_no_date"`), and synthesis status.

7. **Add structured logging to `.ok()` calls**. Replace `batch_insert_entries(...).await.ok()` with `if let Err(e) = batch_insert_entries(...).await { tracing::warn!(...) }`.

### P2 -- Improve Quality

8. **Split Phase 2 search strategies** into a `SearchStrategy` trait with `BraveSearchStrategy` and `LlmSearchStrategy` implementations. The pipeline would call `strategy.search(query, max_results)` without knowing which backend is used.

9. **Extract `provider_resolver.rs`** for the provider/key/model resolution logic (~100 lines currently in `synthesis.rs`).

10. **Introduce `PipelineDeps` trait or struct** to enable full pipeline testing without Postgres/network. Start with the article history check as the first dependency to extract, since it is the most frequently called.

11. **Remove inline SQL from `rate_limiter.rs`** and `synthesis.rs::resolve_model`. Route all queries through `db/` modules.

### P3 -- Nice to Have

12. **Type the `categories` field** as `sqlx::types::Json<Vec<String>>` instead of `serde_json::Value` to eliminate runtime deserialization.

13. **Consolidate response types** -- either all in `models/` or all in `handlers/`, with a consistent convention.

14. **Add a `SearchPromptConfig` struct** to replace the 9-parameter `build_search_prompt` function.

15. **Document the TOCTOU risk** in the Gemini API key URL pattern and consider using the `x-goog-api-key` header instead (if supported by the Gemini API version in use).

---

## 7. What the Codebase Does Well

It is important to acknowledge the strengths that should be preserved during refactoring:

- **Error handling discipline**: The `AppError` enum is consistently used everywhere. No panics in production code. Internal details are never leaked.
- **Security-first design**: SSRF prevention, encrypted secrets, CSRF protection, anti-enumeration, session management -- all implemented correctly.
- **Idiomatic Axum usage**: Extractors, state management, middleware composition, SSE streaming -- all follow framework conventions.
- **Test coverage on leaf components**: Models, utils, and isolated services have thorough unit tests with boundary cases.
- **Documentation**: Module-level doc comments, function-level doc comments, and inline comments explaining non-obvious decisions (e.g., the TOCTOU note in `scraper.rs`).
- **Operational features**: Graceful shutdown, session cleanup, job TTL, rate limit hot-reload, cooperative cancellation -- these show production-mindedness.

---

## Appendix: File Size Summary

| File | Lines | Assessment |
|---|---|---|
| `services/synthesis.rs` | ~1550 | Critical -- needs decomposition |
| `services/scraper.rs` | ~400 | Acceptable |
| `services/rate_limiter.rs` | ~470 | Acceptable (includes thorough tests) |
| `services/prompts.rs` | ~370 | Acceptable (includes thorough tests) |
| `handlers/auth.rs` | ~380 | Acceptable |
| `handlers/sources.rs` | ~280 | Good |
| `handlers/admin.rs` | ~440 | Acceptable |
| `handlers/syntheses.rs` | ~240 | Good |
| `handlers/generation.rs` | ~180 | Good |
| `models/settings.rs` | ~260 | Good (includes thorough tests) |
| `models/synthesis.rs` | ~415 | Acceptable (includes thorough tests) |
| `errors.rs` | ~173 | Good |
| `app_state.rs` | ~82 | Good |
| `router.rs` | ~178 | Good |
| `scheduler.rs` | ~93 | Good size, but needs architectural changes |