You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

20 KiB

AI Weekly Synth -- Test Coverage Audit Report (v2)

Date: 2026-03-27 Auditor: QA Engineer (automated analysis)


1. Test Inventory

1.1 Backend Unit Tests (cargo test --lib)

Total: 358 tests -- all passing.

Source file # tests Coverage area
services/scraper.rs 74 SSRF IP checks, soft-404, redirect, HTML parsing
services/synthesis.rs 36 Pipeline logic, schema building, category overflow
services/llm/anthropic.rs 20 Response parsing, error handling
services/prompts.rs 18 Prompt template generation
services/csv.rs 18 CSV parsing, serialisation
models/synthesis.rs 16 Model validation, serialisation
services/rate_limiter.rs 15 Token bucket, concurrency
services/llm/openai.rs 13 Response parsing, error handling
models/source.rs 12 URL / title validation
models/settings.rs 12 Settings validation, defaults
services/export.rs 12 Markdown / PDF rendering
services/llm/gemini.rs 10 Response parsing, error handling
models/provider.rs 10 Provider / model validation
services/email.rs 9 Email rendering, bypass mode
services/encryption.rs 8 AES-256-GCM encrypt/decrypt
services/source_scraper.rs 8 Link extraction, is_article filter
services/llm/schema.rs 8 JSON schema generation
util/token.rs 8 Token generation, hashing
models/api_key.rs 8 API key validation
middleware/csrf.rs 7 CSRF header check
models/rate_limit.rs 6 Rate limit model validation
config.rs 6 Config parsing
middleware/auth.rs 5 Session extraction
services/llm/factory.rs 5 Provider factory
handlers/admin.rs 4 Admin handler validation
services/brave_search.rs 1 Brave search (minimal)
services/llm/mock.rs 0 Mock provider (no assertions)
errors.rs 0 Error types (no unit tests)

1.2 Backend Integration Tests (backend/tests/)

Total: 183 tests across 17 files (requires Postgres).

File # tests Coverage area
api_sources_test.rs 36 Sources CRUD, validation, CSV, bulk import, max limit
api_admin_test.rs 30 Provider CRUD, rate limits, user mgmt, audit log, config
api_keys_test.rs 18 API key CRUD, encryption, ownership, test endpoint
api_syntheses_test.rs 17 Synthesis CRUD, pagination, ownership, generation trigger
api_auth_test.rs 16 Register, login, verify, logout, session
api_export_test.rs 13 Email send, Markdown export, PDF export
api_themes_test.rs 10 Theme CRUD, validation, ownership
api_schedules_test.rs 9 Schedule CRUD, validation, ownership
api_settings_test.rs 7 Settings CRUD, defaults, boundary, isolation
pipeline_test.rs 6 Phase 1 extraction, Phase 2 search, overflow, diversity, dedup, preferred
api_article_history_test.rs 4 History list, clear, provenance
api_csrf_test.rs 4 CSRF header enforcement
api_stop_generation_test.rs 4 Stop job, ownership, 404
api_llm_logs_test.rs 3 LLM logs auth, 404, happy path
api_sources_preferred_test.rs 3 Preferred sources set/clear/auth
minimal_test.rs 2 Infrastructure sanity (oneshot)
api_health_test.rs 1 Health check

1.3 Frontend Unit Tests (vitest)

Total: 141 tests (131 passing, 10 failing) across 18 files.

File # tests Coverage area
sources-utils.test.ts 20 Source utilities
provider-info.test.ts 11 Provider info helpers
api-keys.test.ts 11 API keys client
synthesis-utils.test.ts 11 Synthesis utilities
sse.test.ts 11 SSE client
settings-validation.test.ts 3 Settings validation
i18n.test.ts 9 i18n translations
api-client.test.ts 7 API client
config-api.test.ts 7 Config API
pages/settings.test.tsx 10 Settings page
pages/sources.test.tsx 8 Sources page
pages/home.test.tsx 7 Home page
pages/generate.test.tsx 6 Generate page
synthesis-export.test.ts 6 Export utilities
pages/login.test.tsx 4 Login page
pages/register.test.tsx 4 Register page
auth-context.test.tsx 3 Auth context
admin-route-guard.test.tsx 3 Admin route guard

1.4 E2E Tests (Playwright)

Total: 7 tests across 7 files.

File Coverage area
registration.spec.ts Full magic link registration flow
settings.spec.ts Settings persistence across reloads
settings-export.spec.ts Settings export/import roundtrip
sources.spec.ts Source CRUD + preferred sources via API
themes.spec.ts Theme CRUD + schedule CRUD via API
admin-providers.spec.ts Admin provider management, settings dropdown
generation-live.spec.ts Full pipeline with real OpenAI key (gated)

2. Feature Coverage Matrix

Feature Unit Tests Integration Tests E2E Tests Coverage
Auth: register - 4 tests 1 test GOOD
Auth: login - 3 tests - GOOD
Auth: magic link verify - 3 tests 1 test GOOD
Auth: /me - 3 tests - GOOD
Auth: logout - 3 tests - GOOD
Auth: session expiry 5 (middleware) - - PARTIAL
CSRF protection 7 (middleware) 4 tests - GOOD
Settings CRUD 12 (model) 7 tests 2 tests GOOD
Sources CRUD 12 (model) 36 tests 1 test GOOD
Sources: CSV import/export 18 (csv) 6 tests - GOOD
Sources: preferred - 3 tests 1 test GOOD
Sources: max limit - 2 tests - GOOD
Themes CRUD - 10 tests 1 test GOOD
Schedules CRUD - 9 tests 1 test GOOD
Scheduled execution 0 0 0 NONE
Syntheses CRUD 16 (model) 11 tests - GOOD
Syntheses: pagination - 2 tests - GOOD
Syntheses: ownership isolation - 2 tests - GOOD
Generation: trigger - 4 tests 1 test (gated) GOOD
Generation: SSE progress 11 (sse client) 0 1 test (gated) PARTIAL
Generation: stop - 4 tests 0 GOOD
Pipeline: Phase 1 (scrape) 74+8 (scraper) 1 test 1 test (gated) GOOD
Pipeline: Phase 2 (search) - 1 test 1 test (gated) GOOD
Pipeline: category overflow 36 (synthesis) 1 test - GOOD
Pipeline: is_article filter 8 (source_scraper) 0 0 PARTIAL
Pipeline: summary_length 18 (prompts) 0 0 PARTIAL
Pipeline: date extraction 0 0 0 NONE
Pipeline: article history dedup - 1 test - GOOD
Pipeline: source diversity cap - 1 test 1 test (gated) GOOD
Pipeline: preferred ordering - 1 test - GOOD
Pipeline: Brave Search 1 (minimal) 0 0 WEAK
API keys CRUD 8 (model) 18 tests - GOOD
API keys: encryption at rest 8 (encryption) 1 test - GOOD
API keys: test endpoint - 2 tests - GOOD
Admin: provider CRUD 10 (model) + 4 (handler) 9 tests 1 test GOOD
Admin: rate limits 6 (model) + 15 (limiter) 4 tests - GOOD
Admin: user management - 6 tests - GOOD
Admin: audit log - 2 tests - GOOD
Config: providers - 3 tests - GOOD
Export: Markdown 12 (export) 4 tests - GOOD
Export: PDF 12 (export) 4 tests - GOOD
Export: Email 9 (email) 5 tests - GOOD
SSRF protection 74 (scraper) 0 0 PARTIAL
LLM call logging - 3 tests 1 test (gated) GOOD
LLM providers (Gemini) 10 - - GOOD
LLM providers (OpenAI) 13 - - GOOD
LLM providers (Anthropic) 20 - - GOOD
Rate limiting 15 0 0 PARTIAL
Turnstile captcha bypass only bypass only bypass only PARTIAL

3. Coverage Gaps and Recommendations

GAP-01: Scheduled Execution (scheduler.rs) -- No tests

Priority: Critical

The run_scheduled_jobs() function in services/scheduler.rs has zero unit tests and zero integration tests. This is a critical autonomous process that triggers generation and sends emails without user interaction.

Tests to write:

  1. Unit test (scheduler.rs): run_scheduled_jobs triggers generation for themes whose schedule matches the current day+time.
  2. Unit test (scheduler.rs): run_scheduled_jobs does NOT trigger generation for disabled schedules.
  3. Unit test (scheduler.rs): run_scheduled_jobs does NOT trigger generation when the current day is not in the schedule's days list.
  4. Integration test (api_schedules_test.rs): Create a schedule set to "now", verify the scheduler picks it up and a synthesis is created (or at least attempted with a mock provider).
  5. Integration test: Verify that after run_scheduled_jobs executes, the schedule's last_run_at timestamp is updated.

GAP-02: SSE Progress Stream -- No integration test

Priority: High

The SSE progress endpoint (GET /api/v1/syntheses/generate/:job_id/progress) is only tested in the E2E suite (gated behind a real API key). There is no integration test that verifies the SSE connection, event format, or error propagation.

Tests to write:

  1. Integration test (api_syntheses_test.rs): Connect to SSE endpoint for a running job and verify the stream sends well-formed events (progress, complete, or error).
  2. Integration test: SSE endpoint returns 404 for a non-existent job_id.
  3. Integration test: SSE endpoint returns 401 without auth.

GAP-03: Brave Search Pipeline -- Minimal coverage

Priority: High

The services/brave_search.rs has only 1 unit test. The Brave Search path in the pipeline (use_brave_search: true) is explicitly commented as untestable in pipeline_test.rs because it requires a real API key. The entire search-via-Brave code path is unverified in integration tests.

Tests to write:

  1. Unit test (brave_search.rs): Parse a valid Brave Search API response and extract URLs.
  2. Unit test (brave_search.rs): Handle Brave API error responses (429 rate limit, 401 invalid key).
  3. Unit test (brave_search.rs): Handle malformed Brave API JSON response gracefully.
  4. Integration test (pipeline_test.rs): Use wiremock to mock the Brave API endpoint and run the pipeline with use_brave_search: true, verifying that Brave results feed into the pipeline.

GAP-04: Date Extraction / max_age_days Filtering -- No tests

Priority: High

The max_age_days field is on themes and is used to filter old articles. However, there are no tests (unit or integration) that verify articles older than max_age_days are excluded from the synthesis. The prompt includes date extraction instructions, but there is no test that validates article age filtering actually works.

Tests to write:

  1. Unit test (synthesis.rs): Articles with a published_at date older than max_age_days are excluded.
  2. Integration test (pipeline_test.rs): Set max_age_days: 1 and provide wiremock articles with old dates, verify they are filtered out.
  3. Unit test (prompts.rs): Verify the date extraction instruction appears in the prompt when max_age_days > 0.

GAP-05: Rate Limiting -- No integration test

Priority: High

The rate limiter has 15 unit tests but zero integration tests. There is no test that verifies the rate limiter actually blocks LLM calls when the limit is exceeded in a real pipeline run, nor any test that the admin-configured rate limits are loaded and applied.

Tests to write:

  1. Integration test: Configure a very low rate limit (e.g., max_requests: 1, time_window_seconds: 60), trigger generation with multiple sources, and verify the rate limiter introduces delays (or that the pipeline logs rate-limit waits).
  2. Integration test: Verify that user-level rate limit overrides (from settings rate_limit_max_requests) are applied when set.

GAP-06: SSRF Protection -- No integration test

Priority: Medium

The SSRF IP checks have 74 unit tests in scraper.rs (excellent is_private_ip coverage), but there is no integration test that verifies the full check_ssrf function actually blocks a request to a private IP through the scraper pipeline. The tests bypass SSRF with SKIP_SSRF_CHECK=1.

Tests to write:

  1. Integration test (pipeline_test.rs): Add a source with a URL pointing to 127.0.0.1 (without SKIP_SSRF_CHECK), verify the scraper rejects it and the pipeline continues with other sources.
  2. Integration test: Verify that redirect to a private IP is blocked (source URL redirects to http://192.168.1.1).

GAP-07: Pipeline is_article Filter -- No end-to-end verification

Priority: Medium

The is_article heuristic in source_scraper.rs has 8 unit tests, but there is no integration test that verifies non-article pages (e.g., category index pages, about pages) are actually filtered out during a pipeline run.

Tests to write:

  1. Integration test (pipeline_test.rs): Set up wiremock with a source page linking to both article pages and non-article pages (e.g., /about, /contact, /category), verify only articles make it into the synthesis.

GAP-08: Pipeline summary_length -- No integration test

Priority: Medium

The summary_length field on themes controls the number of sentences in generated summaries. The prompts unit tests verify the instruction appears in the prompt, but no test verifies the LLM response is actually constrained to the requested length.

Tests to write:

  1. Integration test (pipeline_test.rs): Use MockLlmProvider with summary_length: 1 and verify the mock generates 1-sentence summaries (or verify the prompt includes the correct summary_length instruction).

GAP-09: Settings Validation Rejections -- No negative tests

Priority: Medium

The settings integration tests cover boundary values but do not test rejection of out-of-range values. There are no tests for max_articles_per_source: 0, batch_size: -1, max_links_per_source: 999, etc.

Tests to write:

  1. Integration test (api_settings_test.rs): PUT /settings with max_articles_per_source: 0 returns 422.
  2. Integration test (api_settings_test.rs): PUT /settings with batch_size: 0 returns 422.
  3. Integration test (api_settings_test.rs): PUT /settings with max_links_per_source: 999 returns 422.
  4. Integration test (api_settings_test.rs): PUT /settings with article_history_days: -1 returns 422.

GAP-10: Synthesis Export Content with Special Characters -- No test

Priority: Low

Export tests verify structure and content-type but do not test synthesis content with special characters (accented characters, HTML entities, emoji, very long URLs). There is no test for PDF generation with such edge cases.

Tests to write:

  1. Integration test (api_export_test.rs): Insert a synthesis with UTF-8 characters, URLs with query strings, and long summaries. Export as Markdown and verify content integrity.
  2. Integration test (api_export_test.rs): Same synthesis exported as PDF. Verify PDF magic bytes and non-empty content.

GAP-11: Concurrent Generation -- No test

Priority: Low

The generate_twice_returns_error_for_second test verifies the same user cannot run two jobs. But there is no test for two different users generating simultaneously, which is the expected multi-tenant behavior.

Tests to write:

  1. Integration test (api_syntheses_test.rs): Two users trigger generation simultaneously. Both should get 202. Verify they do not interfere with each other.

GAP-12: Frontend -- Failing tests

Priority: Medium

10 of 141 frontend unit tests are currently failing. These failures need investigation and resolution before any new coverage work.

Action:

  1. Run npx vitest run and capture the 10 failing test names.
  2. Investigate root cause (likely stale mocks or component changes).
  3. Fix or update the failing tests.

GAP-13: Article History Ownership Isolation -- No test

Priority: Medium

The article history endpoint tests cover auth and empty-state, but there is no test verifying that User B cannot see User A's article history.

Tests to write:

  1. Integration test (api_article_history_test.rs): User A generates a synthesis (creating history entries). User B calls GET /article-history and sees empty results.
  2. Integration test (api_article_history_test.rs): User B calls DELETE /article-history and it does not affect User A's history entries.

GAP-14: Provenance Ownership Isolation -- No test

Priority: Medium

The provenance endpoint test only covers the 404 case. There is no test verifying that User B cannot access User A's synthesis provenance.

Tests to write:

  1. Integration test (api_article_history_test.rs): User A has a synthesis with provenance. User B calls GET /syntheses/:id/provenance on User A's synthesis and gets 404.

GAP-15: E2E Coverage of Core Flows -- Missing

Priority: Low

Several core user journeys are only covered by integration tests, not E2E tests:

  • Login flow (only registration has an E2E test)
  • API key management (add, test, delete)
  • Synthesis list/detail view
  • Synthesis deletion
  • Markdown/PDF export download
  • Email send from synthesis view

Tests to write (selected):

  1. E2E test: Login via magic link, verify session persists.
  2. E2E test: Add an API key, verify it appears masked in the list, test it, delete it.
  3. E2E test: View synthesis detail page, click export Markdown, verify download.

4. Summary

Overall Assessment

Metric Value
Backend unit tests 358 (all passing)
Backend integration tests 183
Frontend unit tests 141 (10 failing)
E2E tests 7
Total test count 689
Critical gaps 1 (scheduled execution)
High-priority gaps 4 (SSE stream, Brave Search, date filtering, rate limiting integration)
Medium-priority gaps 6
Low-priority gaps 3

Strengths

  • SSRF protection has the best unit test coverage in the project (74 tests) covering all private IP ranges, IPv4-mapped IPv6, and redirect blocking.
  • Sources CRUD is the most thoroughly tested API endpoint (36 integration tests) including CSV import/export, bulk import, max limits, and boundary values.
  • Admin module has comprehensive access control tests (access denied for non-admin, non-authenticated, each endpoint).
  • Ownership isolation is consistently tested across all user-scoped endpoints (syntheses, sources, API keys, themes, schedules, stop generation).
  • Pipeline tests use wiremock and MockLlmProvider to test the full generation flow without real API keys, covering scraping, classification, overflow, diversity, dedup, and preferred ordering.
  • Encryption verification directly queries the database to confirm API keys are not stored in plaintext.
  • Audit logging is verified by querying the audit_log table after admin actions.
  • E2E generation test exercises the complete pipeline with a real OpenAI key including provenance and LLM log verification.

Weaknesses

  • Scheduled execution has zero test coverage -- a critical autonomous process.
  • Brave Search pipeline path is untested beyond a single unit test.
  • Date extraction/age filtering has no tests at any level.
  • Rate limiting is well unit-tested but has no integration verification.
  • SSE progress stream has no integration test (only gated E2E).
  • Frontend has 10 failing tests that need immediate attention.
  • Settings validation lacks negative boundary tests (rejection of invalid values).