You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

397 lines
20 KiB
Markdown

# AI Weekly Synth -- Test Coverage Audit Report (v2)
**Date:** 2026-03-27
**Auditor:** QA Engineer (automated analysis)
---
## 1. Test Inventory
### 1.1 Backend Unit Tests (`cargo test --lib`)
**Total: 358 tests** -- all passing.
| Source file | # tests | Coverage area |
|---|---|---|
| `services/scraper.rs` | 74 | SSRF IP checks, soft-404, redirect, HTML parsing |
| `services/synthesis.rs` | 36 | Pipeline logic, schema building, category overflow |
| `services/llm/anthropic.rs` | 20 | Response parsing, error handling |
| `services/prompts.rs` | 18 | Prompt template generation |
| `services/csv.rs` | 18 | CSV parsing, serialisation |
| `models/synthesis.rs` | 16 | Model validation, serialisation |
| `services/rate_limiter.rs` | 15 | Token bucket, concurrency |
| `services/llm/openai.rs` | 13 | Response parsing, error handling |
| `models/source.rs` | 12 | URL / title validation |
| `models/settings.rs` | 12 | Settings validation, defaults |
| `services/export.rs` | 12 | Markdown / PDF rendering |
| `services/llm/gemini.rs` | 10 | Response parsing, error handling |
| `models/provider.rs` | 10 | Provider / model validation |
| `services/email.rs` | 9 | Email rendering, bypass mode |
| `services/encryption.rs` | 8 | AES-256-GCM encrypt/decrypt |
| `services/source_scraper.rs` | 8 | Link extraction, is_article filter |
| `services/llm/schema.rs` | 8 | JSON schema generation |
| `util/token.rs` | 8 | Token generation, hashing |
| `models/api_key.rs` | 8 | API key validation |
| `middleware/csrf.rs` | 7 | CSRF header check |
| `models/rate_limit.rs` | 6 | Rate limit model validation |
| `config.rs` | 6 | Config parsing |
| `middleware/auth.rs` | 5 | Session extraction |
| `services/llm/factory.rs` | 5 | Provider factory |
| `handlers/admin.rs` | 4 | Admin handler validation |
| `services/brave_search.rs` | 1 | Brave search (minimal) |
| `services/llm/mock.rs` | 0 | Mock provider (no assertions) |
| `errors.rs` | 0 | Error types (no unit tests) |
### 1.2 Backend Integration Tests (`backend/tests/`)
**Total: 183 tests** across 17 files (requires Postgres).
| File | # tests | Coverage area |
|---|---|---|
| `api_sources_test.rs` | 36 | Sources CRUD, validation, CSV, bulk import, max limit |
| `api_admin_test.rs` | 30 | Provider CRUD, rate limits, user mgmt, audit log, config |
| `api_keys_test.rs` | 18 | API key CRUD, encryption, ownership, test endpoint |
| `api_syntheses_test.rs` | 17 | Synthesis CRUD, pagination, ownership, generation trigger |
| `api_auth_test.rs` | 16 | Register, login, verify, logout, session |
| `api_export_test.rs` | 13 | Email send, Markdown export, PDF export |
| `api_themes_test.rs` | 10 | Theme CRUD, validation, ownership |
| `api_schedules_test.rs` | 9 | Schedule CRUD, validation, ownership |
| `api_settings_test.rs` | 7 | Settings CRUD, defaults, boundary, isolation |
| `pipeline_test.rs` | 6 | Phase 1 extraction, Phase 2 search, overflow, diversity, dedup, preferred |
| `api_article_history_test.rs` | 4 | History list, clear, provenance |
| `api_csrf_test.rs` | 4 | CSRF header enforcement |
| `api_stop_generation_test.rs` | 4 | Stop job, ownership, 404 |
| `api_llm_logs_test.rs` | 3 | LLM logs auth, 404, happy path |
| `api_sources_preferred_test.rs` | 3 | Preferred sources set/clear/auth |
| `minimal_test.rs` | 2 | Infrastructure sanity (oneshot) |
| `api_health_test.rs` | 1 | Health check |
### 1.3 Frontend Unit Tests (`vitest`)
**Total: 141 tests** (131 passing, 10 failing) across 18 files.
| File | # tests | Coverage area |
|---|---|---|
| `sources-utils.test.ts` | 20 | Source utilities |
| `provider-info.test.ts` | 11 | Provider info helpers |
| `api-keys.test.ts` | 11 | API keys client |
| `synthesis-utils.test.ts` | 11 | Synthesis utilities |
| `sse.test.ts` | 11 | SSE client |
| `settings-validation.test.ts` | 3 | Settings validation |
| `i18n.test.ts` | 9 | i18n translations |
| `api-client.test.ts` | 7 | API client |
| `config-api.test.ts` | 7 | Config API |
| `pages/settings.test.tsx` | 10 | Settings page |
| `pages/sources.test.tsx` | 8 | Sources page |
| `pages/home.test.tsx` | 7 | Home page |
| `pages/generate.test.tsx` | 6 | Generate page |
| `synthesis-export.test.ts` | 6 | Export utilities |
| `pages/login.test.tsx` | 4 | Login page |
| `pages/register.test.tsx` | 4 | Register page |
| `auth-context.test.tsx` | 3 | Auth context |
| `admin-route-guard.test.tsx` | 3 | Admin route guard |
### 1.4 E2E Tests (Playwright)
**Total: 7 tests** across 7 files.
| File | Coverage area |
|---|---|
| `registration.spec.ts` | Full magic link registration flow |
| `settings.spec.ts` | Settings persistence across reloads |
| `settings-export.spec.ts` | Settings export/import roundtrip |
| `sources.spec.ts` | Source CRUD + preferred sources via API |
| `themes.spec.ts` | Theme CRUD + schedule CRUD via API |
| `admin-providers.spec.ts` | Admin provider management, settings dropdown |
| `generation-live.spec.ts` | Full pipeline with real OpenAI key (gated) |
---
## 2. Feature Coverage Matrix
| Feature | Unit Tests | Integration Tests | E2E Tests | Coverage |
|---|---|---|---|---|
| **Auth: register** | - | 4 tests | 1 test | GOOD |
| **Auth: login** | - | 3 tests | - | GOOD |
| **Auth: magic link verify** | - | 3 tests | 1 test | GOOD |
| **Auth: /me** | - | 3 tests | - | GOOD |
| **Auth: logout** | - | 3 tests | - | GOOD |
| **Auth: session expiry** | 5 (middleware) | - | - | PARTIAL |
| **CSRF protection** | 7 (middleware) | 4 tests | - | GOOD |
| **Settings CRUD** | 12 (model) | 7 tests | 2 tests | GOOD |
| **Sources CRUD** | 12 (model) | 36 tests | 1 test | GOOD |
| **Sources: CSV import/export** | 18 (csv) | 6 tests | - | GOOD |
| **Sources: preferred** | - | 3 tests | 1 test | GOOD |
| **Sources: max limit** | - | 2 tests | - | GOOD |
| **Themes CRUD** | - | 10 tests | 1 test | GOOD |
| **Schedules CRUD** | - | 9 tests | 1 test | GOOD |
| **Scheduled execution** | 0 | 0 | 0 | **NONE** |
| **Syntheses CRUD** | 16 (model) | 11 tests | - | GOOD |
| **Syntheses: pagination** | - | 2 tests | - | GOOD |
| **Syntheses: ownership isolation** | - | 2 tests | - | GOOD |
| **Generation: trigger** | - | 4 tests | 1 test (gated) | GOOD |
| **Generation: SSE progress** | 11 (sse client) | 0 | 1 test (gated) | PARTIAL |
| **Generation: stop** | - | 4 tests | 0 | GOOD |
| **Pipeline: Phase 1 (scrape)** | 74+8 (scraper) | 1 test | 1 test (gated) | GOOD |
| **Pipeline: Phase 2 (search)** | - | 1 test | 1 test (gated) | GOOD |
| **Pipeline: category overflow** | 36 (synthesis) | 1 test | - | GOOD |
| **Pipeline: is_article filter** | 8 (source_scraper) | 0 | 0 | PARTIAL |
| **Pipeline: summary_length** | 18 (prompts) | 0 | 0 | PARTIAL |
| **Pipeline: date extraction** | 0 | 0 | 0 | **NONE** |
| **Pipeline: article history dedup** | - | 1 test | - | GOOD |
| **Pipeline: source diversity cap** | - | 1 test | 1 test (gated) | GOOD |
| **Pipeline: preferred ordering** | - | 1 test | - | GOOD |
| **Pipeline: Brave Search** | 1 (minimal) | 0 | 0 | **WEAK** |
| **API keys CRUD** | 8 (model) | 18 tests | - | GOOD |
| **API keys: encryption at rest** | 8 (encryption) | 1 test | - | GOOD |
| **API keys: test endpoint** | - | 2 tests | - | GOOD |
| **Admin: provider CRUD** | 10 (model) + 4 (handler) | 9 tests | 1 test | GOOD |
| **Admin: rate limits** | 6 (model) + 15 (limiter) | 4 tests | - | GOOD |
| **Admin: user management** | - | 6 tests | - | GOOD |
| **Admin: audit log** | - | 2 tests | - | GOOD |
| **Config: providers** | - | 3 tests | - | GOOD |
| **Export: Markdown** | 12 (export) | 4 tests | - | GOOD |
| **Export: PDF** | 12 (export) | 4 tests | - | GOOD |
| **Export: Email** | 9 (email) | 5 tests | - | GOOD |
| **SSRF protection** | 74 (scraper) | 0 | 0 | PARTIAL |
| **LLM call logging** | - | 3 tests | 1 test (gated) | GOOD |
| **LLM providers (Gemini)** | 10 | - | - | GOOD |
| **LLM providers (OpenAI)** | 13 | - | - | GOOD |
| **LLM providers (Anthropic)** | 20 | - | - | GOOD |
| **Rate limiting** | 15 | 0 | 0 | PARTIAL |
| **Turnstile captcha** | bypass only | bypass only | bypass only | PARTIAL |
---
## 3. Coverage Gaps and Recommendations
### GAP-01: Scheduled Execution (scheduler.rs) -- No tests
**Priority: Critical**
The `run_scheduled_jobs()` function in `services/scheduler.rs` has zero unit tests and zero integration tests. This is a critical autonomous process that triggers generation and sends emails without user interaction.
**Tests to write:**
1. **Unit test** (`scheduler.rs`): `run_scheduled_jobs` triggers generation for themes whose schedule matches the current day+time.
2. **Unit test** (`scheduler.rs`): `run_scheduled_jobs` does NOT trigger generation for disabled schedules.
3. **Unit test** (`scheduler.rs`): `run_scheduled_jobs` does NOT trigger generation when the current day is not in the schedule's `days` list.
4. **Integration test** (`api_schedules_test.rs`): Create a schedule set to "now", verify the scheduler picks it up and a synthesis is created (or at least attempted with a mock provider).
5. **Integration test**: Verify that after `run_scheduled_jobs` executes, the schedule's `last_run_at` timestamp is updated.
---
### GAP-02: SSE Progress Stream -- No integration test
**Priority: High**
The SSE progress endpoint (`GET /api/v1/syntheses/generate/:job_id/progress`) is only tested in the E2E suite (gated behind a real API key). There is no integration test that verifies the SSE connection, event format, or error propagation.
**Tests to write:**
1. **Integration test** (`api_syntheses_test.rs`): Connect to SSE endpoint for a running job and verify the stream sends well-formed events (`progress`, `complete`, or `error`).
2. **Integration test**: SSE endpoint returns 404 for a non-existent job_id.
3. **Integration test**: SSE endpoint returns 401 without auth.
---
### GAP-03: Brave Search Pipeline -- Minimal coverage
**Priority: High**
The `services/brave_search.rs` has only 1 unit test. The Brave Search path in the pipeline (`use_brave_search: true`) is explicitly commented as untestable in `pipeline_test.rs` because it requires a real API key. The entire search-via-Brave code path is unverified in integration tests.
**Tests to write:**
1. **Unit test** (`brave_search.rs`): Parse a valid Brave Search API response and extract URLs.
2. **Unit test** (`brave_search.rs`): Handle Brave API error responses (429 rate limit, 401 invalid key).
3. **Unit test** (`brave_search.rs`): Handle malformed Brave API JSON response gracefully.
4. **Integration test** (`pipeline_test.rs`): Use wiremock to mock the Brave API endpoint and run the pipeline with `use_brave_search: true`, verifying that Brave results feed into the pipeline.
---
### GAP-04: Date Extraction / max_age_days Filtering -- No tests
**Priority: High**
The `max_age_days` field is on themes and is used to filter old articles. However, there are no tests (unit or integration) that verify articles older than `max_age_days` are excluded from the synthesis. The prompt includes date extraction instructions, but there is no test that validates article age filtering actually works.
**Tests to write:**
1. **Unit test** (`synthesis.rs`): Articles with a `published_at` date older than `max_age_days` are excluded.
2. **Integration test** (`pipeline_test.rs`): Set `max_age_days: 1` and provide wiremock articles with old dates, verify they are filtered out.
3. **Unit test** (`prompts.rs`): Verify the date extraction instruction appears in the prompt when `max_age_days > 0`.
---
### GAP-05: Rate Limiting -- No integration test
**Priority: High**
The rate limiter has 15 unit tests but zero integration tests. There is no test that verifies the rate limiter actually blocks LLM calls when the limit is exceeded in a real pipeline run, nor any test that the admin-configured rate limits are loaded and applied.
**Tests to write:**
1. **Integration test**: Configure a very low rate limit (e.g., `max_requests: 1, time_window_seconds: 60`), trigger generation with multiple sources, and verify the rate limiter introduces delays (or that the pipeline logs rate-limit waits).
2. **Integration test**: Verify that user-level rate limit overrides (from settings `rate_limit_max_requests`) are applied when set.
---
### GAP-06: SSRF Protection -- No integration test
**Priority: Medium**
The SSRF IP checks have 74 unit tests in `scraper.rs` (excellent is_private_ip coverage), but there is no integration test that verifies the full `check_ssrf` function actually blocks a request to a private IP through the scraper pipeline. The tests bypass SSRF with `SKIP_SSRF_CHECK=1`.
**Tests to write:**
1. **Integration test** (`pipeline_test.rs`): Add a source with a URL pointing to `127.0.0.1` (without `SKIP_SSRF_CHECK`), verify the scraper rejects it and the pipeline continues with other sources.
2. **Integration test**: Verify that redirect to a private IP is blocked (source URL redirects to `http://192.168.1.1`).
---
### GAP-07: Pipeline is_article Filter -- No end-to-end verification
**Priority: Medium**
The `is_article` heuristic in `source_scraper.rs` has 8 unit tests, but there is no integration test that verifies non-article pages (e.g., category index pages, about pages) are actually filtered out during a pipeline run.
**Tests to write:**
1. **Integration test** (`pipeline_test.rs`): Set up wiremock with a source page linking to both article pages and non-article pages (e.g., `/about`, `/contact`, `/category`), verify only articles make it into the synthesis.
---
### GAP-08: Pipeline summary_length -- No integration test
**Priority: Medium**
The `summary_length` field on themes controls the number of sentences in generated summaries. The prompts unit tests verify the instruction appears in the prompt, but no test verifies the LLM response is actually constrained to the requested length.
**Tests to write:**
1. **Integration test** (`pipeline_test.rs`): Use MockLlmProvider with `summary_length: 1` and verify the mock generates 1-sentence summaries (or verify the prompt includes the correct `summary_length` instruction).
---
### GAP-09: Settings Validation Rejections -- No negative tests
**Priority: Medium**
The settings integration tests cover boundary values but do not test rejection of out-of-range values. There are no tests for `max_articles_per_source: 0`, `batch_size: -1`, `max_links_per_source: 999`, etc.
**Tests to write:**
1. **Integration test** (`api_settings_test.rs`): `PUT /settings` with `max_articles_per_source: 0` returns 422.
2. **Integration test** (`api_settings_test.rs`): `PUT /settings` with `batch_size: 0` returns 422.
3. **Integration test** (`api_settings_test.rs`): `PUT /settings` with `max_links_per_source: 999` returns 422.
4. **Integration test** (`api_settings_test.rs`): `PUT /settings` with `article_history_days: -1` returns 422.
---
### GAP-10: Synthesis Export Content with Special Characters -- No test
**Priority: Low**
Export tests verify structure and content-type but do not test synthesis content with special characters (accented characters, HTML entities, emoji, very long URLs). There is no test for PDF generation with such edge cases.
**Tests to write:**
1. **Integration test** (`api_export_test.rs`): Insert a synthesis with UTF-8 characters, URLs with query strings, and long summaries. Export as Markdown and verify content integrity.
2. **Integration test** (`api_export_test.rs`): Same synthesis exported as PDF. Verify PDF magic bytes and non-empty content.
---
### GAP-11: Concurrent Generation -- No test
**Priority: Low**
The `generate_twice_returns_error_for_second` test verifies the same user cannot run two jobs. But there is no test for two *different* users generating simultaneously, which is the expected multi-tenant behavior.
**Tests to write:**
1. **Integration test** (`api_syntheses_test.rs`): Two users trigger generation simultaneously. Both should get 202. Verify they do not interfere with each other.
---
### GAP-12: Frontend -- Failing tests
**Priority: Medium**
10 of 141 frontend unit tests are currently failing. These failures need investigation and resolution before any new coverage work.
**Action:**
1. Run `npx vitest run` and capture the 10 failing test names.
2. Investigate root cause (likely stale mocks or component changes).
3. Fix or update the failing tests.
---
### GAP-13: Article History Ownership Isolation -- No test
**Priority: Medium**
The article history endpoint tests cover auth and empty-state, but there is no test verifying that User B cannot see User A's article history.
**Tests to write:**
1. **Integration test** (`api_article_history_test.rs`): User A generates a synthesis (creating history entries). User B calls `GET /article-history` and sees empty results.
2. **Integration test** (`api_article_history_test.rs`): User B calls `DELETE /article-history` and it does not affect User A's history entries.
---
### GAP-14: Provenance Ownership Isolation -- No test
**Priority: Medium**
The provenance endpoint test only covers the 404 case. There is no test verifying that User B cannot access User A's synthesis provenance.
**Tests to write:**
1. **Integration test** (`api_article_history_test.rs`): User A has a synthesis with provenance. User B calls `GET /syntheses/:id/provenance` on User A's synthesis and gets 404.
---
### GAP-15: E2E Coverage of Core Flows -- Missing
**Priority: Low**
Several core user journeys are only covered by integration tests, not E2E tests:
- Login flow (only registration has an E2E test)
- API key management (add, test, delete)
- Synthesis list/detail view
- Synthesis deletion
- Markdown/PDF export download
- Email send from synthesis view
**Tests to write (selected):**
1. **E2E test**: Login via magic link, verify session persists.
2. **E2E test**: Add an API key, verify it appears masked in the list, test it, delete it.
3. **E2E test**: View synthesis detail page, click export Markdown, verify download.
---
## 4. Summary
### Overall Assessment
| Metric | Value |
|---|---|
| Backend unit tests | 358 (all passing) |
| Backend integration tests | 183 |
| Frontend unit tests | 141 (10 failing) |
| E2E tests | 7 |
| **Total test count** | **689** |
| Critical gaps | 1 (scheduled execution) |
| High-priority gaps | 4 (SSE stream, Brave Search, date filtering, rate limiting integration) |
| Medium-priority gaps | 6 |
| Low-priority gaps | 3 |
### Strengths
- **SSRF protection** has the best unit test coverage in the project (74 tests) covering all private IP ranges, IPv4-mapped IPv6, and redirect blocking.
- **Sources CRUD** is the most thoroughly tested API endpoint (36 integration tests) including CSV import/export, bulk import, max limits, and boundary values.
- **Admin module** has comprehensive access control tests (access denied for non-admin, non-authenticated, each endpoint).
- **Ownership isolation** is consistently tested across all user-scoped endpoints (syntheses, sources, API keys, themes, schedules, stop generation).
- **Pipeline tests** use wiremock and MockLlmProvider to test the full generation flow without real API keys, covering scraping, classification, overflow, diversity, dedup, and preferred ordering.
- **Encryption verification** directly queries the database to confirm API keys are not stored in plaintext.
- **Audit logging** is verified by querying the audit_log table after admin actions.
- **E2E generation test** exercises the complete pipeline with a real OpenAI key including provenance and LLM log verification.
### Weaknesses
- **Scheduled execution** has zero test coverage -- a critical autonomous process.
- **Brave Search** pipeline path is untested beyond a single unit test.
- **Date extraction/age filtering** has no tests at any level.
- **Rate limiting** is well unit-tested but has no integration verification.
- **SSE progress stream** has no integration test (only gated E2E).
- **Frontend** has 10 failing tests that need immediate attention.
- **Settings validation** lacks negative boundary tests (rejection of invalid values).