# QA Integration/E2E Audit Report Date: 2026-03-27 Scope: `docs/requirements.md`, `docs/functional_specs.md`, `docs/technical_specs.md`, `docs/qa_guidelines.md`, `backend/tests`, `e2e/tests`, test scripts. ## 1) Clarification Questions 1. Should scheduled execution reliability (background scheduler + email fanout) be release-gated with deterministic integration tests, or only monitored in production? 2. Is an external-provider live E2E (`OPENAI_TEST_API_KEY`) acceptable as the *only* end-to-end coverage for SSE/progress completion, or do you want deterministic in-house SSE coverage in CI? ## 2) Assumptions - CI quality gates should not rely on external LLM providers. - Core product requirements (scheduled generation, generation progress, fallback paths) should be covered by deterministic integration tests. - This report prioritizes integration/E2E confidence over unit-test volume. ## 3) Prioritized Findings (P0-P3) ### P0 — Scheduled execution path is effectively untested (critical requirement risk) - Why it matters: Scheduled generation + email delivery is a core requirement. Regressions here can silently fail user deliverables. - Evidence: - Scheduler runtime logic exists in [scheduler.rs:27](/Users/oabrivard/Projects/rust/ai_synth/backend/src/services/scheduler.rs:27) through [scheduler.rs:91](/Users/oabrivard/Projects/rust/ai_synth/backend/src/services/scheduler.rs:91). - Only one trivial scheduler unit test exists ([scheduler.rs:98](/Users/oabrivard/Projects/rust/ai_synth/backend/src/services/scheduler.rs:98)). - Requirement explicitly expects scheduled reliability ([requirements.md:136](/Users/oabrivard/Projects/rust/ai_synth/docs/requirements.md:136)). - Direction: Add deterministic integration tests for due schedule selection, double-run prevention (`last_run_at`), job contention behavior, and email send invocation outcomes. ### P1 — SSE progress endpoint has no deterministic integration coverage - Why it matters: Generation UX and cancellation safety depend on SSE correctness. - Evidence: - SSE handler is implemented in [generation.rs:150](/Users/oabrivard/Projects/rust/ai_synth/backend/src/handlers/generation.rs:150). - Integration tests validate trigger/duplicate behavior only, not progress stream contract ([api_syntheses_test.rs:565](/Users/oabrivard/Projects/rust/ai_synth/backend/tests/api_syntheses_test.rs:565), [api_syntheses_test.rs:617](/Users/oabrivard/Projects/rust/ai_synth/backend/tests/api_syntheses_test.rs:617)). - Existing live SSE check is gated and external ([generation-live.spec.ts:119](/Users/oabrivard/Projects/rust/ai_synth/e2e/tests/generation-live.spec.ts:119)). - Direction: Add integration tests that subscribe to `/progress`, assert `progress -> complete/error` sequence, ownership enforcement, reconnect semantics, and keepalive stability. ### P1 — Brave Search fallback path lacks integration coverage - Why it matters: Fallback branch is a key functional path and currently high regression risk. - Evidence: - Brave branch in pipeline code: [synthesis.rs:371](/Users/oabrivard/Projects/rust/ai_synth/backend/src/services/synthesis.rs:371). - Pipeline test file explicitly states this path is not integration-tested ([pipeline_test.rs:257](/Users/oabrivard/Projects/rust/ai_synth/backend/tests/pipeline_test.rs:257)). - Direction: Add mock HTTP server + encrypted Brave key fixture flow to execute `use_brave_search=true` end-to-end in integration tests. ### P1 — Pipeline integration does not verify rate-limit behavior - Why it matters: Rate limiting is a non-functional requirement; failures can produce outages or provider bans. - Evidence: - Pipeline tests set user rate limit fields to `null` ([pipeline_test.rs:64](/Users/oabrivard/Projects/rust/ai_synth/backend/tests/pipeline_test.rs:64)). - No integration assertions around rate-limited waits/error propagation. - Direction: Add integration scenarios for strict user/provider limits and verify wait/retry/timeout outcomes. ### P1 — Pipeline integration does not verify max-age article filtering behavior - Why it matters: Freshness is a core content-quality requirement. - Evidence: - Pipeline tests consistently use high `max_age_days` values ([pipeline_test.rs:77](/Users/oabrivard/Projects/rust/ai_synth/backend/tests/pipeline_test.rs:77)). - No integration assertion for `filtered_too_old` trace behavior. - Direction: Add wiremock articles with old publish dates + assertions on filtering and history status. ### P2 — E2E suite is heavily API-driven, limited UI journey validation - Why it matters: UI regressions can pass E2E while backend endpoints stay healthy. - Evidence: - Sources and themes E2E use `page.evaluate(fetch(...))` for most operations ([sources.spec.ts:23](/Users/oabrivard/Projects/rust/ai_synth/e2e/tests/sources.spec.ts:23), [themes.spec.ts:29](/Users/oabrivard/Projects/rust/ai_synth/e2e/tests/themes.spec.ts:29)). - Direction: Keep API-assisted setup, but assert critical user interactions through UI (form submit, validation messages, control states). ### P2 — Article history ownership isolation is not explicitly tested - Why it matters: Multi-user data isolation is security-sensitive. - Evidence: - Current article history integration tests cover auth + empty/clear/provenance 404 only ([api_article_history_test.rs:24](/Users/oabrivard/Projects/rust/ai_synth/backend/tests/api_article_history_test.rs:24)). - Direction: Add user A vs user B cross-access tests for history and provenance endpoints. ### P2 — QA guidelines are out of sync with current codebase signals - Why it matters: stale test inventory causes false confidence in planning and release gates. - Evidence: - Documented counts/status in [qa_guidelines.md:7](/Users/oabrivard/Projects/rust/ai_synth/docs/qa_guidelines.md:7) to [qa_guidelines.md:11](/Users/oabrivard/Projects/rust/ai_synth/docs/qa_guidelines.md:11). - Current local grep counts: backend unit ~359, backend integration ~187, frontend unit ~135 tests, E2E 7. - Direction: Automate inventory generation in CI and update `docs/qa_guidelines.md` from machine output. ### P3 — Frontend unit test execution environment is currently brittle - Why it matters: slows QA feedback loop and hides regressions. - Evidence: - Local run `cd frontend && npx vitest run` failed due missing optional Rollup binary (`@rollup/rollup-darwin-x64`). - Direction: Add a clean install/bootstrap check in CI and pin known-good Node/npm workflow. ## 4) Coverage Map (Required Capability vs Current Coverage) | Capability | Unit | Integration | E2E | Status | | --- | --- | --- | --- | --- | | Auth (register/login/verify/session) | Medium | Strong (`api_auth_test.rs`) | Medium (`registration.spec.ts`) | Good | | Theme CRUD | Low | Strong (`api_themes_test.rs`) | Medium (API-driven) | Good | | Source CRUD/import/export/preferred | Medium | Strong (`api_sources_test.rs`) | Medium (API-driven) | Good | | On-demand generation trigger/duplicate/stop | Medium | Medium (`api_syntheses_test.rs`, `api_stop_generation_test.rs`) | Medium (live test gated) | Partial | | SSE progress stream contract | Low | Weak | Weak (only external live) | Gap | | Pipeline Phase 1 (personalized sources) | Medium | Medium (`pipeline_test.rs`) | Low | Partial | | Pipeline Phase 2 (LLM web search) | Medium | Medium (`pipeline_test.rs`) | Low | Partial | | Pipeline Phase 2 (Brave Search) | Low | None | None | Gap | | Scheduled config CRUD | Low | Medium (`api_schedules_test.rs`) | Medium (API-driven in themes E2E) | Partial | | Scheduled execution runtime | Low | None | None | Gap | | Export email/pdf/markdown | Medium | Strong (`api_export_test.rs`) | Low | Good | | Article history/provenance security | Low | Weak (no ownership isolation) | None | Gap | | Rate limiting in real generation flow | Medium | None | None | Gap | | Date freshness filtering in pipeline | Medium (scraper unit) | None | None | Gap | ## 5) Test Architecture Issues (Flakiness / Speed / Isolation / Observability) - Flakiness risk: `generation-live.spec.ts` depends on external OpenAI availability and behavior ([generation-live.spec.ts:1](/Users/oabrivard/Projects/rust/ai_synth/e2e/tests/generation-live.spec.ts:1)). - Speed tradeoff: E2E is stable-ish due single worker and API-first setup, but this under-tests real UI behavior. - Isolation strengths: backend integration per-test DB isolation via `TestApp` is strong. - Observability gap: no dedicated integration assertions for SSE stream semantics and scheduler outcomes. ## 6) Detailed QA / Refactoring Plan ### Phase 1 (1-2 weeks): close highest-risk deterministic gaps - Add scheduler integration suite: - due schedule executes once - `last_run_at` blocks double-run - active manual job causes skip - email send errors are logged and do not crash loop - Add SSE integration suite: - authorized subscribe receives latest event - unauthorized/foreign job denied - `complete` and `error` payload schema checks - Add Brave Search integration path with mocked Brave API and stored encrypted key fixture. ### Phase 2 (1 week): non-functional policy tests - Add pipeline integration tests for: - `max_age_days` filtering (`filtered_too_old` assertions) - user/provider rate-limit behavior under contention - cancellation mid-batch and partial-save invariants. ### Phase 3 (1 week): E2E realism upgrades - Convert at least 3 API-heavy E2E scenarios to UI-driven workflows: - theme create/update/delete - source add/import/preferred/delete - schedule form save/delete. - Keep API shortcuts only for setup/cleanup. ### Phase 4 (2-3 days): documentation and gate hardening - Generate test inventory automatically (counts, pass/fail) and publish into QA docs. - Split CI lanes: - deterministic required lane (unit/integration/mock-e2e) - optional live-provider lane (non-blocking). ## 7) Quick Wins - Add one integration test for `/syntheses/generate/{job_id}/progress` happy path + ownership check. - Add one integration test for scheduled execution `mark_run` behavior using controlled due schedule fixture. - Add one article-history cross-user isolation test. - Mark `generation-live.spec.ts` as non-blocking in CI with explicit label/reporting. - Update `docs/qa_guidelines.md` inventory counts to current observed baseline. ## Execution Notes - Ran successfully: `cd backend && cargo test --lib` -> 359 passed. - Could not execute frontend unit tests due environment dependency issue (`@rollup/rollup-darwin-x64` missing in local `node_modules`).