ai_synth/docs/audit/test-coverage-gaps.md

# Test Coverage Gap Analysis

**Date:** 2026-03-27
**Scope:** Backend integration tests (`backend/tests/`), backend unit tests (`backend/src/`), E2E tests (`e2e/tests/`)

---

## Summary

| Priority | Count |
|----------|-------|
| Critical | 2     |
| High     | 5     |
| Medium   | 5     |
| Low      | 3     |

---

## Critical Gaps

### GAP-01 — No integration tests for the Themes API (`/api/v1/themes`)

**Description:**
There is no `backend/tests/api_themes_test.rs` file. The themes handlers (`GET`, `POST`, `PUT`, `DELETE /api/v1/themes/:id`) have four endpoints with non-trivial logic: input validation via `CreateThemeRequest::validate()`, partial updates (`UpdateThemeRequest` with all-optional fields), ownership isolation (other users must get 404, not 403), and a cascade constraint (deleting a theme that still has sources assigned). None of these code paths are exercised by any integration test. The pipeline tests create a theme as setup scaffolding but do not test the themes API surface itself.

**Priority:** Critical

**What to add:**
Create `backend/tests/api_themes_test.rs` covering:
- `GET /themes` — unauthenticated returns 401; returns empty list for new user; returns created theme
- `POST /themes` — creates theme with minimal required fields; applies defaults for `max_items_per_category`, `max_age_days`, `summary_length`; validates empty name (422); validates empty categories list (422); validates `max_items_per_category` out of range (422); validates `summary_length` out of range (422); validates >20 categories (422)
- `PUT /themes/:id` — partial update (name only); partial update (categories only); returns 404 for non-existent id; returns 404 (not 403) when another user's theme id is used
- `DELETE /themes/:id` — returns 204 on success; returns 404 for non-existent id; returns 404 (not 403) for another user's theme id
- Ownership isolation — user A's themes are not visible to user B

---

### GAP-02 — No integration tests for the Article History API (`/api/v1/article-history`, `/api/v1/syntheses/:id/provenance`)

**Description:**
Two endpoints exist at the router level (`GET /article-history`, `DELETE /article-history`, `GET /syntheses/:id/provenance`) with no integration tests whatsoever. The pipeline test verifies that rows are written to `article_history` via a raw SQL check, but nothing tests the HTTP endpoints. The `provenance` endpoint in particular is essential for users auditing what was fetched per synthesis — a 404 from a bad join or wrong user scope would be silently untested.

**Priority:** Critical

**What to add:**
Create `backend/tests/api_article_history_test.rs` covering:
- `GET /article-history` — unauthenticated returns 401; returns empty list for new user; returns entries after a pipeline run; user isolation (user A's history not visible to user B)
- `DELETE /article-history` — unauthenticated returns 401; clears only the requesting user's history; returns 204
- `GET /syntheses/:id/provenance` — unauthenticated returns 401; returns correct trace entries for a known synthesis; returns 404 for non-existent synthesis id; returns 404 (not 403) for another user's synthesis

---

## High Gaps

### GAP-03 — No pipeline integration test for the Brave Search code path (`use_brave_search: true`)

**Description:**
The synthesis pipeline has two distinct code branches at line 603 of `synthesis.rs` based on `settings.use_brave_search`. All three pipeline integration tests in `pipeline_test.rs` set `"use_brave_search": false`. The entire Brave Search path — `resolve_brave_key`, `crate::services::brave_search::search`, `filter_phase2_url`, article classification via Brave results, and `filtered_diversity` traces for brave URLs — is never exercised by any integration test. The `brave_search` unit test only checks the `freshness_from_days` mapping helper.

**Priority:** High

**What to add:**
Add a pipeline test in `pipeline_test.rs` (or a new `api_brave_search_pipeline_test.rs`) that:
- Creates a user with `use_brave_search: true` and a stored `brave_search` API key
- Injects a mock LLM that returns article URLs plus a mock HTTP server for the Brave API response
- Asserts that articles from the Brave path appear in the synthesis sections
- Asserts `source_type = "brave_search"` in `article_history`

Also add to `api_keys_test.rs`:
- `POST /user/api-keys` with `provider_name: "brave_search"` succeeds and stores the key
- `POST /user/api-keys/brave_search/test` exercises the separate Brave key test branch (currently only LLM providers are tested via the test endpoint)

---

### GAP-04 — `assign_category` function has no unit tests

**Description:**
`assign_category` (line 1148 of `synthesis.rs`) is a pure function responsible for matching the LLM's output key (e.g. `"category_0"`) to the user's configured category list, extracting title and summary, and returning `None` when the key is not found. It is called on every article during both the source-scrape pass and the Gemini grounding pass. The synthesis unit test module (`mod tests`) covers `parse_llm_output`, `rotate_sources`, `normalize_article_url`, and `sanitize_error_message`, but `assign_category` is absent despite being equally pure and testable.

**Priority:** High

**What to add:**
Add unit tests inside `synthesis.rs` `mod tests`:
- Matching key in a single-category map returns correct `(key, name, title, summary)` tuple
- Matching key when multiple categories exist picks the correct one
- Non-matching key returns `None`
- Item with empty title or summary is handled gracefully

---

### GAP-05 — Source diversity (`max_articles_per_source`) enforcement has no dedicated test

**Description:**
`max_articles_per_source` is implemented in the pipeline (the `filtered_diversity` status in `synthesis.rs` at line 403 and line 1214) and is a live setting that ships to users, but there is no integration or unit test verifying that articles from the same domain get capped. The `pipeline_test.rs` tests configure `max_articles_per_source` values but never assert that the diversity cap was actually triggered or that the trace status `"filtered_diversity"` appears.

**Priority:** High

**What to add:**
In `pipeline_test.rs` add a test that:
- Sets `max_articles_per_source: 1`
- Injects a mock LLM that returns two URLs from the same domain (e.g. `https://blog.example.com/a` and `https://blog.example.com/b`) and a third from a different domain
- Asserts only one article from `blog.example.com` appears in the final sections
- Asserts that an `article_history` row with `status = "filtered_diversity"` exists for the dropped URL

---

### GAP-06 — No E2E spec for the Themes management UI

**Description:**
The `e2e/tests/` directory contains dedicated specs for sources, settings, admin providers, and generation. There is no `themes.spec.ts`. The `generation-live.spec.ts` creates a theme via direct API call as setup scaffolding but never tests the Themes page UI: creating a theme through the form, editing it, deleting it, or validating that validation errors surface correctly. This is particularly risky because themes are the primary user configuration surface for the synthesis pipeline.

**Priority:** High

**What to add:**
Create `e2e/tests/themes.spec.ts` covering:
- Create a theme via the UI form and verify it appears in the list
- Edit a theme name and save; verify the updated name persists after page reload
- Delete a theme; verify it disappears from the list
- Submit the form with an empty name; verify a validation error is shown

---

### GAP-07 — Article history deduplication behavior has no integration test

**Description:**
The history deduplication mechanism (checking `url_hash` in `article_history` before including a URL) is a core pipeline behavior to prevent the same article appearing in multiple consecutive weekly syntheses. While `pipeline_test.rs` verifies rows are written, no test runs the pipeline twice and asserts that the second run skips URLs already seen (producing `article_history` rows with `status = "filtered_already_seen"` or similar). The `article_history_days: 0` setting (which disables deduplication entirely) is also untested in the pipeline.

**Priority:** High

**What to add:**
In `pipeline_test.rs` add a test that:
- Runs the mock pipeline once, recording which URLs were `"used"`
- Runs the pipeline a second time with the same source URLs
- Asserts the second run does not re-include URLs already present in `article_history` for that user
- Optionally: a variant with `article_history_days: 0` confirming re-inclusion is allowed

---

## Medium Gaps

### GAP-08 — No unit tests for `CreateThemeRequest::validate()`

**Description:**
The `validate()` method on `CreateThemeRequest` contains seven distinct validation rules (empty name, name >200 chars, empty theme, empty categories, >20 categories, empty category element, range checks on `max_items_per_category`/`max_age_days`/`summary_length`). None of these rules are tested by unit tests in `backend/src/models/theme.rs`, which has no `#[cfg(test)]` block at all. The integration tests that will be added in GAP-01 will exercise this indirectly via HTTP, but pure unit tests would be faster and more granular.

**Priority:** Medium

**What to add:**
Add `#[cfg(test)] mod tests` in `backend/src/models/theme.rs` with unit tests for each validation rule, mirroring the style of `backend/src/models/settings.rs`'s existing `mod tests`.

---

### GAP-09 — `api_keys_test.rs` only tests `gemini` and `openai` provider keys

**Description:**
The `api_keys_test.rs` tests use `provider_name: "gemini"` for almost every test and `"openai"` in one test. The `"anthropic"` provider and the special-cased `"brave_search"` provider are never used. The `brave_search` provider has distinct handler logic (lines 124–144 of `api_keys.rs`) that routes to `brave_search::test_api_key` instead of the LLM factory — this branch is entirely untested.

**Priority:** Medium

**What to add:**
Add to `api_keys_test.rs`:
- Store and retrieve an `anthropic` API key
- Store a `brave_search` API key and call `POST /user/api-keys/brave_search/test` — verify the response shape is `{ success: bool, message: string }`
- Verify `POST /user/api-keys/invalid_provider/test` returns 404 when no key exists

---

### GAP-10 — No tests for the `source_scraper` service against real network-style responses

**Description:**
`source_scraper.rs` has unit tests that cover `extract_links_from_html` and related HTML parsing. However, there are no tests for the `scrape_source` function path that sends real HTTP requests to extract article links, respects `max_links_per_source`, or handles redirect chains. These are exercised only incidentally by E2E tests that hit real URLs. A wiremock or similar in-process HTTP server could isolate these tests from external dependencies.

**Priority:** Medium

**What to add:**
Add integration-style tests in `source_scraper.rs` or a new test file using `wiremock` or `axum`'s test utilities to stand up a local HTTP server, covering:
- Fetch and parse links from a mock HTML page respecting `max_links_per_source`
- SSRF blocked URL returns `AppError::BadRequest`
- Redirect to a private IP is blocked by the SSRF middleware

---

### GAP-11 — No E2E test for the Article History / Provenance UI

**Description:**
No E2E spec exercises the article history list page or the per-synthesis provenance view. These are both user-facing features for auditing synthesis quality. Their absence from E2E means a broken route or rendering error could go undetected before release.

**Priority:** Medium

**What to add:**
Add cases to a new or existing E2E spec:
- Navigate to the article history page; verify it loads without error
- After triggering a synthesis (can reuse `generation-live.spec.ts` setup), open the synthesis detail and verify the provenance section renders at least one entry

---

### GAP-12 — `export.rs` unit tests do not cover PDF export shape

**Description:**
`backend/src/services/export.rs` has a `#[cfg(test)]` module. The integration tests in `api_export_test.rs` cover HTTP status codes for Markdown and PDF endpoints (auth, not-found, valid). However, neither the unit tests nor the integration tests assert the actual structure of the PDF output (e.g. that the PDF bytes start with `%PDF-`). A regression that produces zero-length or malformed PDF bytes would not be caught.

**Priority:** Medium

**What to add:**
In `api_export_test.rs` add a test that:
- Requests `GET /syntheses/:id/export/pdf` for a valid synthesis
- Reads the response body bytes
- Asserts the first 4 bytes are `%PDF` (the PDF magic number)

---

## Low Gaps

### GAP-13 — No test for `session` expiry / stale cookie rejection

**Description:**
`api_auth_test.rs` covers logout and valid session use, but no test deletes a session row directly from the database and then confirms the next authenticated request returns 401. The session validation path in the auth middleware (`middleware/auth.rs`) is only tested via unit tests for token extraction, not via DB-backed expiry.

**Priority:** Low

**What to add:**
Add an integration test that:
- Creates an authenticated user and obtains a session token
- Deletes the session row directly via `sqlx::query`
- Asserts the next `GET /api/v1/settings` with the old session token returns 401

---

### GAP-14 — Generation endpoint does not have a test for missing LLM API key

**Description:**
If a user triggers generation but has no API key stored for their selected provider, the pipeline will fail during `resolve_provider_and_key`. No test covers this specific error path. The 202 response is returned before the async pipeline fails, so the SSE `error` event is the user-visible signal — but nothing tests that it emits the right error type.

**Priority:** Low

**What to add:**
Add a `pipeline_test.rs` test that:
- Creates a user with no API keys stored
- Runs the pipeline directly (not via HTTP) with a mock that requires a key
- Asserts the pipeline returns an error with a sanitized message (not leaking internal details)

---

### GAP-15 — `prompts.rs` unit tests do not cover `use_brave_search` or `search_agent_behavior` prompt injection

**Description:**
`backend/src/services/prompts.rs` has a `#[cfg(test)]` module. The existing tests check basic prompt construction. However, the `search_agent_behavior` field (injected as a custom instruction into the prompt) and the Brave Search-specific prompt variant are not tested. A whitespace error or missing format string in the `search_agent_behavior` injection would go undetected.

**Priority:** Low

**What to add:**
Add unit tests to `prompts.rs` `mod tests`:
- When `search_agent_behavior` is non-empty, the returned prompt string contains the custom instruction verbatim
- When `search_agent_behavior` is empty, the prompt does not contain a spurious empty line or placeholder

---

## Coverage Matrix

| Area | Integration Test | Unit Test | E2E Test |
|------|-----------------|-----------|----------|
| Auth (register, login, verify, logout) | Yes | Partial | No |
| Settings CRUD | Yes | Yes | Yes |
| Sources CRUD + CSV | Yes | Yes | Yes |
| **Themes CRUD** | **NO** | **NO** | **NO** |
| API Keys CRUD + test | Partial (gemini/openai only) | Yes | No |
| Syntheses CRUD + pagination | Yes | Yes | No |
| Generation trigger + SSE | Yes (202 + conflict) | Yes (JobStore) | Yes |
| Pipeline (mock LLM, source scrape path) | Yes (3 tests) | Yes | No |
| **Pipeline (Brave Search path)** | **NO** | Partial | No |
| **Pipeline (max_articles_per_source cap)** | **NO** | No | No |
| **Pipeline (article history dedup)** | **NO** | No | No |
| Admin providers + rate limits | Yes | Yes | Yes |
| Admin user management + roles | Yes | Yes | No |
| Export (Markdown, PDF, email) | Yes | Yes | Yes (Markdown only) |
| **Article History endpoints** | **NO** | No | **NO** |
| **Synthesis Provenance endpoint** | **NO** | No | **NO** |
| CSRF middleware | Yes | Yes | No |
| SSRF prevention | No | Yes | No |
| Encryption at rest | Yes | Yes | No |
| Brave Search key test endpoint | **NO** | No | No |