You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

20 KiB

Raw Blame History

AI Weekly Synth -- Test Coverage Audit Report (v2)

Date: 2026-03-27 Auditor: QA Engineer (automated analysis)

1. Test Inventory

1.1 Backend Unit Tests (`cargo test --lib`)

Total: 358 tests -- all passing.

Source file	# tests	Coverage area
`services/scraper.rs`	74	SSRF IP checks, soft-404, redirect, HTML parsing
`services/synthesis.rs`	36	Pipeline logic, schema building, category overflow
`services/llm/anthropic.rs`	20	Response parsing, error handling
`services/prompts.rs`	18	Prompt template generation
`services/csv.rs`	18	CSV parsing, serialisation
`models/synthesis.rs`	16	Model validation, serialisation
`services/rate_limiter.rs`	15	Token bucket, concurrency
`services/llm/openai.rs`	13	Response parsing, error handling
`models/source.rs`	12	URL / title validation
`models/settings.rs`	12	Settings validation, defaults
`services/export.rs`	12	Markdown / PDF rendering
`services/llm/gemini.rs`	10	Response parsing, error handling
`models/provider.rs`	10	Provider / model validation
`services/email.rs`	9	Email rendering, bypass mode
`services/encryption.rs`	8	AES-256-GCM encrypt/decrypt
`services/source_scraper.rs`	8	Link extraction, is_article filter
`services/llm/schema.rs`	8	JSON schema generation
`util/token.rs`	8	Token generation, hashing
`models/api_key.rs`	8	API key validation
`middleware/csrf.rs`	7	CSRF header check
`models/rate_limit.rs`	6	Rate limit model validation
`config.rs`	6	Config parsing
`middleware/auth.rs`	5	Session extraction
`services/llm/factory.rs`	5	Provider factory
`handlers/admin.rs`	4	Admin handler validation
`services/brave_search.rs`	1	Brave search (minimal)
`services/llm/mock.rs`	0	Mock provider (no assertions)
`errors.rs`	0	Error types (no unit tests)

1.2 Backend Integration Tests (`backend/tests/`)

Total: 183 tests across 17 files (requires Postgres).

File	# tests	Coverage area
`api_sources_test.rs`	36	Sources CRUD, validation, CSV, bulk import, max limit
`api_admin_test.rs`	30	Provider CRUD, rate limits, user mgmt, audit log, config
`api_keys_test.rs`	18	API key CRUD, encryption, ownership, test endpoint
`api_syntheses_test.rs`	17	Synthesis CRUD, pagination, ownership, generation trigger
`api_auth_test.rs`	16	Register, login, verify, logout, session
`api_export_test.rs`	13	Email send, Markdown export, PDF export
`api_themes_test.rs`	10	Theme CRUD, validation, ownership
`api_schedules_test.rs`	9	Schedule CRUD, validation, ownership
`api_settings_test.rs`	7	Settings CRUD, defaults, boundary, isolation
`pipeline_test.rs`	6	Phase 1 extraction, Phase 2 search, overflow, diversity, dedup, preferred
`api_article_history_test.rs`	4	History list, clear, provenance
`api_csrf_test.rs`	4	CSRF header enforcement
`api_stop_generation_test.rs`	4	Stop job, ownership, 404
`api_llm_logs_test.rs`	3	LLM logs auth, 404, happy path
`api_sources_preferred_test.rs`	3	Preferred sources set/clear/auth
`minimal_test.rs`	2	Infrastructure sanity (oneshot)
`api_health_test.rs`	1	Health check

1.3 Frontend Unit Tests (`vitest`)

Total: 141 tests (131 passing, 10 failing) across 18 files.

File	# tests	Coverage area
`sources-utils.test.ts`	20	Source utilities
`provider-info.test.ts`	11	Provider info helpers
`api-keys.test.ts`	11	API keys client
`synthesis-utils.test.ts`	11	Synthesis utilities
`sse.test.ts`	11	SSE client
`settings-validation.test.ts`	3	Settings validation
`i18n.test.ts`	9	i18n translations
`api-client.test.ts`	7	API client
`config-api.test.ts`	7	Config API
`pages/settings.test.tsx`	10	Settings page
`pages/sources.test.tsx`	8	Sources page
`pages/home.test.tsx`	7	Home page
`pages/generate.test.tsx`	6	Generate page
`synthesis-export.test.ts`	6	Export utilities
`pages/login.test.tsx`	4	Login page
`pages/register.test.tsx`	4	Register page
`auth-context.test.tsx`	3	Auth context
`admin-route-guard.test.tsx`	3	Admin route guard

1.4 E2E Tests (Playwright)

Total: 7 tests across 7 files.

File	Coverage area
`registration.spec.ts`	Full magic link registration flow
`settings.spec.ts`	Settings persistence across reloads
`settings-export.spec.ts`	Settings export/import roundtrip
`sources.spec.ts`	Source CRUD + preferred sources via API
`themes.spec.ts`	Theme CRUD + schedule CRUD via API
`admin-providers.spec.ts`	Admin provider management, settings dropdown
`generation-live.spec.ts`	Full pipeline with real OpenAI key (gated)

2. Feature Coverage Matrix

Feature	Unit Tests	Integration Tests	E2E Tests	Coverage
Auth: register	-	4 tests	1 test	GOOD
Auth: login	-	3 tests	-	GOOD
Auth: magic link verify	-	3 tests	1 test	GOOD
Auth: /me	-	3 tests	-	GOOD
Auth: logout	-	3 tests	-	GOOD
Auth: session expiry	5 (middleware)	-	-	PARTIAL
CSRF protection	7 (middleware)	4 tests	-	GOOD
Settings CRUD	12 (model)	7 tests	2 tests	GOOD
Sources CRUD	12 (model)	36 tests	1 test	GOOD
Sources: CSV import/export	18 (csv)	6 tests	-	GOOD
Sources: preferred	-	3 tests	1 test	GOOD
Sources: max limit	-	2 tests	-	GOOD
Themes CRUD	-	10 tests	1 test	GOOD
Schedules CRUD	-	9 tests	1 test	GOOD
Scheduled execution	0	0	0	NONE
Syntheses CRUD	16 (model)	11 tests	-	GOOD
Syntheses: pagination	-	2 tests	-	GOOD
Syntheses: ownership isolation	-	2 tests	-	GOOD
Generation: trigger	-	4 tests	1 test (gated)	GOOD
Generation: SSE progress	11 (sse client)	0	1 test (gated)	PARTIAL
Generation: stop	-	4 tests	0	GOOD
Pipeline: Phase 1 (scrape)	74+8 (scraper)	1 test	1 test (gated)	GOOD
Pipeline: Phase 2 (search)	-	1 test	1 test (gated)	GOOD
Pipeline: category overflow	36 (synthesis)	1 test	-	GOOD
Pipeline: is_article filter	8 (source_scraper)	0	0	PARTIAL
Pipeline: summary_length	18 (prompts)	0	0	PARTIAL
Pipeline: date extraction	0	0	0	NONE
Pipeline: article history dedup	-	1 test	-	GOOD
Pipeline: source diversity cap	-	1 test	1 test (gated)	GOOD
Pipeline: preferred ordering	-	1 test	-	GOOD
Pipeline: Brave Search	1 (minimal)	0	0	WEAK
API keys CRUD	8 (model)	18 tests	-	GOOD
API keys: encryption at rest	8 (encryption)	1 test	-	GOOD
API keys: test endpoint	-	2 tests	-	GOOD
Admin: provider CRUD	10 (model) + 4 (handler)	9 tests	1 test	GOOD
Admin: rate limits	6 (model) + 15 (limiter)	4 tests	-	GOOD
Admin: user management	-	6 tests	-	GOOD
Admin: audit log	-	2 tests	-	GOOD
Config: providers	-	3 tests	-	GOOD
Export: Markdown	12 (export)	4 tests	-	GOOD
Export: PDF	12 (export)	4 tests	-	GOOD
Export: Email	9 (email)	5 tests	-	GOOD
SSRF protection	74 (scraper)	0	0	PARTIAL
LLM call logging	-	3 tests	1 test (gated)	GOOD
LLM providers (Gemini)	10	-	-	GOOD
LLM providers (OpenAI)	13	-	-	GOOD
LLM providers (Anthropic)	20	-	-	GOOD
Rate limiting	15	0	0	PARTIAL
Turnstile captcha	bypass only	bypass only	bypass only	PARTIAL

3. Coverage Gaps and Recommendations

GAP-01: Scheduled Execution (scheduler.rs) -- No tests

Priority: Critical

The run_scheduled_jobs() function in services/scheduler.rs has zero unit tests and zero integration tests. This is a critical autonomous process that triggers generation and sends emails without user interaction.

Tests to write:

Unit test (scheduler.rs): run_scheduled_jobs triggers generation for themes whose schedule matches the current day+time.
Unit test (scheduler.rs): run_scheduled_jobs does NOT trigger generation for disabled schedules.
Unit test (scheduler.rs): run_scheduled_jobs does NOT trigger generation when the current day is not in the schedule's days list.
Integration test (api_schedules_test.rs): Create a schedule set to "now", verify the scheduler picks it up and a synthesis is created (or at least attempted with a mock provider).
Integration test: Verify that after run_scheduled_jobs executes, the schedule's last_run_at timestamp is updated.

GAP-02: SSE Progress Stream -- No integration test

Priority: High

The SSE progress endpoint (GET /api/v1/syntheses/generate/:job_id/progress) is only tested in the E2E suite (gated behind a real API key). There is no integration test that verifies the SSE connection, event format, or error propagation.

Tests to write:

Integration test (api_syntheses_test.rs): Connect to SSE endpoint for a running job and verify the stream sends well-formed events (progress, complete, or error).
Integration test: SSE endpoint returns 404 for a non-existent job_id.
Integration test: SSE endpoint returns 401 without auth.

GAP-03: Brave Search Pipeline -- Minimal coverage

Priority: High

The services/brave_search.rs has only 1 unit test. The Brave Search path in the pipeline (use_brave_search: true) is explicitly commented as untestable in pipeline_test.rs because it requires a real API key. The entire search-via-Brave code path is unverified in integration tests.

Tests to write:

Unit test (brave_search.rs): Parse a valid Brave Search API response and extract URLs.
Unit test (brave_search.rs): Handle Brave API error responses (429 rate limit, 401 invalid key).
Unit test (brave_search.rs): Handle malformed Brave API JSON response gracefully.
Integration test (pipeline_test.rs): Use wiremock to mock the Brave API endpoint and run the pipeline with use_brave_search: true, verifying that Brave results feed into the pipeline.

GAP-04: Date Extraction / max_age_days Filtering -- No tests

Priority: High

The max_age_days field is on themes and is used to filter old articles. However, there are no tests (unit or integration) that verify articles older than max_age_days are excluded from the synthesis. The prompt includes date extraction instructions, but there is no test that validates article age filtering actually works.

Tests to write:

Unit test (synthesis.rs): Articles with a published_at date older than max_age_days are excluded.
Integration test (pipeline_test.rs): Set max_age_days: 1 and provide wiremock articles with old dates, verify they are filtered out.
Unit test (prompts.rs): Verify the date extraction instruction appears in the prompt when max_age_days > 0.

GAP-05: Rate Limiting -- No integration test

Priority: High

The rate limiter has 15 unit tests but zero integration tests. There is no test that verifies the rate limiter actually blocks LLM calls when the limit is exceeded in a real pipeline run, nor any test that the admin-configured rate limits are loaded and applied.

Tests to write:

Integration test: Configure a very low rate limit (e.g., max_requests: 1, time_window_seconds: 60), trigger generation with multiple sources, and verify the rate limiter introduces delays (or that the pipeline logs rate-limit waits).
Integration test: Verify that user-level rate limit overrides (from settings rate_limit_max_requests) are applied when set.

GAP-06: SSRF Protection -- No integration test

Priority: Medium

The SSRF IP checks have 74 unit tests in scraper.rs (excellent is_private_ip coverage), but there is no integration test that verifies the full check_ssrf function actually blocks a request to a private IP through the scraper pipeline. The tests bypass SSRF with SKIP_SSRF_CHECK=1.

Tests to write:

Integration test (pipeline_test.rs): Add a source with a URL pointing to 127.0.0.1 (without SKIP_SSRF_CHECK), verify the scraper rejects it and the pipeline continues with other sources.
Integration test: Verify that redirect to a private IP is blocked (source URL redirects to http://192.168.1.1).

GAP-07: Pipeline is_article Filter -- No end-to-end verification

Priority: Medium

The is_article heuristic in source_scraper.rs has 8 unit tests, but there is no integration test that verifies non-article pages (e.g., category index pages, about pages) are actually filtered out during a pipeline run.

Tests to write:

Integration test (pipeline_test.rs): Set up wiremock with a source page linking to both article pages and non-article pages (e.g., /about, /contact, /category), verify only articles make it into the synthesis.

GAP-08: Pipeline summary_length -- No integration test

Priority: Medium

The summary_length field on themes controls the number of sentences in generated summaries. The prompts unit tests verify the instruction appears in the prompt, but no test verifies the LLM response is actually constrained to the requested length.

Tests to write:

Integration test (pipeline_test.rs): Use MockLlmProvider with summary_length: 1 and verify the mock generates 1-sentence summaries (or verify the prompt includes the correct summary_length instruction).

GAP-09: Settings Validation Rejections -- No negative tests

Priority: Medium

The settings integration tests cover boundary values but do not test rejection of out-of-range values. There are no tests for max_articles_per_source: 0, batch_size: -1, max_links_per_source: 999, etc.

Tests to write:

Integration test (api_settings_test.rs): PUT /settings with max_articles_per_source: 0 returns 422.
Integration test (api_settings_test.rs): PUT /settings with batch_size: 0 returns 422.
Integration test (api_settings_test.rs): PUT /settings with max_links_per_source: 999 returns 422.
Integration test (api_settings_test.rs): PUT /settings with article_history_days: -1 returns 422.

GAP-10: Synthesis Export Content with Special Characters -- No test

Priority: Low

Export tests verify structure and content-type but do not test synthesis content with special characters (accented characters, HTML entities, emoji, very long URLs). There is no test for PDF generation with such edge cases.

Tests to write:

Integration test (api_export_test.rs): Insert a synthesis with UTF-8 characters, URLs with query strings, and long summaries. Export as Markdown and verify content integrity.
Integration test (api_export_test.rs): Same synthesis exported as PDF. Verify PDF magic bytes and non-empty content.

GAP-11: Concurrent Generation -- No test

Priority: Low

The generate_twice_returns_error_for_second test verifies the same user cannot run two jobs. But there is no test for two different users generating simultaneously, which is the expected multi-tenant behavior.

Tests to write:

Integration test (api_syntheses_test.rs): Two users trigger generation simultaneously. Both should get 202. Verify they do not interfere with each other.

GAP-12: Frontend -- Failing tests

Priority: Medium

10 of 141 frontend unit tests are currently failing. These failures need investigation and resolution before any new coverage work.

Action:

Run npx vitest run and capture the 10 failing test names.
Investigate root cause (likely stale mocks or component changes).
Fix or update the failing tests.

GAP-13: Article History Ownership Isolation -- No test

Priority: Medium

The article history endpoint tests cover auth and empty-state, but there is no test verifying that User B cannot see User A's article history.

Tests to write:

Integration test (api_article_history_test.rs): User A generates a synthesis (creating history entries). User B calls GET /article-history and sees empty results.
Integration test (api_article_history_test.rs): User B calls DELETE /article-history and it does not affect User A's history entries.

GAP-14: Provenance Ownership Isolation -- No test

Priority: Medium

The provenance endpoint test only covers the 404 case. There is no test verifying that User B cannot access User A's synthesis provenance.

Tests to write:

Integration test (api_article_history_test.rs): User A has a synthesis with provenance. User B calls GET /syntheses/:id/provenance on User A's synthesis and gets 404.

GAP-15: E2E Coverage of Core Flows -- Missing

Priority: Low

Several core user journeys are only covered by integration tests, not E2E tests:

Login flow (only registration has an E2E test)
API key management (add, test, delete)
Synthesis list/detail view
Synthesis deletion
Markdown/PDF export download
Email send from synthesis view

Tests to write (selected):

E2E test: Login via magic link, verify session persists.
E2E test: Add an API key, verify it appears masked in the list, test it, delete it.
E2E test: View synthesis detail page, click export Markdown, verify download.

4. Summary

Overall Assessment

Metric	Value
Backend unit tests	358 (all passing)
Backend integration tests	183
Frontend unit tests	141 (10 failing)
E2E tests	7
Total test count	689
Critical gaps	1 (scheduled execution)
High-priority gaps	4 (SSE stream, Brave Search, date filtering, rate limiting integration)
Medium-priority gaps	6
Low-priority gaps	3

Strengths

SSRF protection has the best unit test coverage in the project (74 tests) covering all private IP ranges, IPv4-mapped IPv6, and redirect blocking.
Sources CRUD is the most thoroughly tested API endpoint (36 integration tests) including CSV import/export, bulk import, max limits, and boundary values.
Admin module has comprehensive access control tests (access denied for non-admin, non-authenticated, each endpoint).
Ownership isolation is consistently tested across all user-scoped endpoints (syntheses, sources, API keys, themes, schedules, stop generation).
Pipeline tests use wiremock and MockLlmProvider to test the full generation flow without real API keys, covering scraping, classification, overflow, diversity, dedup, and preferred ordering.
Encryption verification directly queries the database to confirm API keys are not stored in plaintext.
Audit logging is verified by querying the audit_log table after admin actions.
E2E generation test exercises the complete pipeline with a real OpenAI key including provenance and LLM log verification.

Weaknesses

Scheduled execution has zero test coverage -- a critical autonomous process.
Brave Search pipeline path is untested beyond a single unit test.
Date extraction/age filtering has no tests at any level.
Rate limiting is well unit-tested but has no integration verification.
SSE progress stream has no integration test (only gated E2E).
Frontend has 10 failing tests that need immediate attention.
Settings validation lacks negative boundary tests (rejection of invalid values).

20 KiB Raw Blame History

AI Weekly Synth -- Test Coverage Audit Report (v2)

1. Test Inventory

1.1 Backend Unit Tests (cargo test --lib)

1.2 Backend Integration Tests (backend/tests/)

1.3 Frontend Unit Tests (vitest)

1.4 E2E Tests (Playwright)

2. Feature Coverage Matrix

3. Coverage Gaps and Recommendations

GAP-01: Scheduled Execution (scheduler.rs) -- No tests

GAP-02: SSE Progress Stream -- No integration test

GAP-03: Brave Search Pipeline -- Minimal coverage

GAP-04: Date Extraction / max_age_days Filtering -- No tests

GAP-05: Rate Limiting -- No integration test

GAP-06: SSRF Protection -- No integration test

GAP-07: Pipeline is_article Filter -- No end-to-end verification

GAP-08: Pipeline summary_length -- No integration test

GAP-09: Settings Validation Rejections -- No negative tests

GAP-10: Synthesis Export Content with Special Characters -- No test

GAP-11: Concurrent Generation -- No test

GAP-12: Frontend -- Failing tests

GAP-13: Article History Ownership Isolation -- No test

GAP-14: Provenance Ownership Isolation -- No test

GAP-15: E2E Coverage of Core Flows -- Missing

4. Summary

Overall Assessment

Strengths

Weaknesses

20 KiB

Raw Blame History

1.1 Backend Unit Tests (`cargo test --lib`)

1.2 Backend Integration Tests (`backend/tests/`)

1.3 Frontend Unit Tests (`vitest`)