ai_synth/audits/2026-03-27/qa-integration-e2e.md

# QA Integration/E2E Audit Report

Date: 2026-03-27
Scope: `docs/requirements.md`, `docs/functional_specs.md`, `docs/technical_specs.md`, `docs/qa_guidelines.md`, `backend/tests`, `e2e/tests`, test scripts.

## 1) Clarification Questions

1. Should scheduled execution reliability (background scheduler + email fanout) be release-gated with deterministic integration tests, or only monitored in production?
2. Is an external-provider live E2E (`OPENAI_TEST_API_KEY`) acceptable as the *only* end-to-end coverage for SSE/progress completion, or do you want deterministic in-house SSE coverage in CI?

## 2) Assumptions

- CI quality gates should not rely on external LLM providers.
- Core product requirements (scheduled generation, generation progress, fallback paths) should be covered by deterministic integration tests.
- This report prioritizes integration/E2E confidence over unit-test volume.

## 3) Prioritized Findings (P0-P3)

### P0 — Scheduled execution path is effectively untested (critical requirement risk)
- Why it matters: Scheduled generation + email delivery is a core requirement. Regressions here can silently fail user deliverables.
- Evidence:
  - Scheduler runtime logic exists in [scheduler.rs:27](/Users/oabrivard/Projects/rust/ai_synth/backend/src/services/scheduler.rs:27) through [scheduler.rs:91](/Users/oabrivard/Projects/rust/ai_synth/backend/src/services/scheduler.rs:91).
  - Only one trivial scheduler unit test exists ([scheduler.rs:98](/Users/oabrivard/Projects/rust/ai_synth/backend/src/services/scheduler.rs:98)).
  - Requirement explicitly expects scheduled reliability ([requirements.md:136](/Users/oabrivard/Projects/rust/ai_synth/docs/requirements.md:136)).
- Direction: Add deterministic integration tests for due schedule selection, double-run prevention (`last_run_at`), job contention behavior, and email send invocation outcomes.

### P1 — SSE progress endpoint has no deterministic integration coverage
- Why it matters: Generation UX and cancellation safety depend on SSE correctness.
- Evidence:
  - SSE handler is implemented in [generation.rs:150](/Users/oabrivard/Projects/rust/ai_synth/backend/src/handlers/generation.rs:150).
  - Integration tests validate trigger/duplicate behavior only, not progress stream contract ([api_syntheses_test.rs:565](/Users/oabrivard/Projects/rust/ai_synth/backend/tests/api_syntheses_test.rs:565), [api_syntheses_test.rs:617](/Users/oabrivard/Projects/rust/ai_synth/backend/tests/api_syntheses_test.rs:617)).
  - Existing live SSE check is gated and external ([generation-live.spec.ts:119](/Users/oabrivard/Projects/rust/ai_synth/e2e/tests/generation-live.spec.ts:119)).
- Direction: Add integration tests that subscribe to `/progress`, assert `progress -> complete/error` sequence, ownership enforcement, reconnect semantics, and keepalive stability.

### P1 — Brave Search fallback path lacks integration coverage
- Why it matters: Fallback branch is a key functional path and currently high regression risk.
- Evidence:
  - Brave branch in pipeline code: [synthesis.rs:371](/Users/oabrivard/Projects/rust/ai_synth/backend/src/services/synthesis.rs:371).
  - Pipeline test file explicitly states this path is not integration-tested ([pipeline_test.rs:257](/Users/oabrivard/Projects/rust/ai_synth/backend/tests/pipeline_test.rs:257)).
- Direction: Add mock HTTP server + encrypted Brave key fixture flow to execute `use_brave_search=true` end-to-end in integration tests.

### P1 — Pipeline integration does not verify rate-limit behavior
- Why it matters: Rate limiting is a non-functional requirement; failures can produce outages or provider bans.
- Evidence:
  - Pipeline tests set user rate limit fields to `null` ([pipeline_test.rs:64](/Users/oabrivard/Projects/rust/ai_synth/backend/tests/pipeline_test.rs:64)).
  - No integration assertions around rate-limited waits/error propagation.
- Direction: Add integration scenarios for strict user/provider limits and verify wait/retry/timeout outcomes.

### P1 — Pipeline integration does not verify max-age article filtering behavior
- Why it matters: Freshness is a core content-quality requirement.
- Evidence:
  - Pipeline tests consistently use high `max_age_days` values ([pipeline_test.rs:77](/Users/oabrivard/Projects/rust/ai_synth/backend/tests/pipeline_test.rs:77)).
  - No integration assertion for `filtered_too_old` trace behavior.
- Direction: Add wiremock articles with old publish dates + assertions on filtering and history status.

### P2 — E2E suite is heavily API-driven, limited UI journey validation
- Why it matters: UI regressions can pass E2E while backend endpoints stay healthy.
- Evidence:
  - Sources and themes E2E use `page.evaluate(fetch(...))` for most operations ([sources.spec.ts:23](/Users/oabrivard/Projects/rust/ai_synth/e2e/tests/sources.spec.ts:23), [themes.spec.ts:29](/Users/oabrivard/Projects/rust/ai_synth/e2e/tests/themes.spec.ts:29)).
- Direction: Keep API-assisted setup, but assert critical user interactions through UI (form submit, validation messages, control states).

### P2 — Article history ownership isolation is not explicitly tested
- Why it matters: Multi-user data isolation is security-sensitive.
- Evidence:
  - Current article history integration tests cover auth + empty/clear/provenance 404 only ([api_article_history_test.rs:24](/Users/oabrivard/Projects/rust/ai_synth/backend/tests/api_article_history_test.rs:24)).
- Direction: Add user A vs user B cross-access tests for history and provenance endpoints.

### P2 — QA guidelines are out of sync with current codebase signals
- Why it matters: stale test inventory causes false confidence in planning and release gates.
- Evidence:
  - Documented counts/status in [qa_guidelines.md:7](/Users/oabrivard/Projects/rust/ai_synth/docs/qa_guidelines.md:7) to [qa_guidelines.md:11](/Users/oabrivard/Projects/rust/ai_synth/docs/qa_guidelines.md:11).
  - Current local grep counts: backend unit ~359, backend integration ~187, frontend unit ~135 tests, E2E 7.
- Direction: Automate inventory generation in CI and update `docs/qa_guidelines.md` from machine output.

### P3 — Frontend unit test execution environment is currently brittle
- Why it matters: slows QA feedback loop and hides regressions.
- Evidence:
  - Local run `cd frontend && npx vitest run` failed due missing optional Rollup binary (`@rollup/rollup-darwin-x64`).
- Direction: Add a clean install/bootstrap check in CI and pin known-good Node/npm workflow.

## 4) Coverage Map (Required Capability vs Current Coverage)

| Capability | Unit | Integration | E2E | Status |
|---|---|---|---|---|
| Auth (register/login/verify/session) | Medium | Strong (`api_auth_test.rs`) | Medium (`registration.spec.ts`) | Good |
| Theme CRUD | Low | Strong (`api_themes_test.rs`) | Medium (API-driven) | Good |
| Source CRUD/import/export/preferred | Medium | Strong (`api_sources_test.rs`) | Medium (API-driven) | Good |
| On-demand generation trigger/duplicate/stop | Medium | Medium (`api_syntheses_test.rs`, `api_stop_generation_test.rs`) | Medium (live test gated) | Partial |
| SSE progress stream contract | Low | Weak | Weak (only external live) | Gap |
| Pipeline Phase 1 (personalized sources) | Medium | Medium (`pipeline_test.rs`) | Low | Partial |
| Pipeline Phase 2 (LLM web search) | Medium | Medium (`pipeline_test.rs`) | Low | Partial |
| Pipeline Phase 2 (Brave Search) | Low | None | None | Gap |
| Scheduled config CRUD | Low | Medium (`api_schedules_test.rs`) | Medium (API-driven in themes E2E) | Partial |
| Scheduled execution runtime | Low | None | None | Gap |
| Export email/pdf/markdown | Medium | Strong (`api_export_test.rs`) | Low | Good |
| Article history/provenance security | Low | Weak (no ownership isolation) | None | Gap |
| Rate limiting in real generation flow | Medium | None | None | Gap |
| Date freshness filtering in pipeline | Medium (scraper unit) | None | None | Gap |

## 5) Test Architecture Issues (Flakiness / Speed / Isolation / Observability)

- Flakiness risk: `generation-live.spec.ts` depends on external OpenAI availability and behavior ([generation-live.spec.ts:1](/Users/oabrivard/Projects/rust/ai_synth/e2e/tests/generation-live.spec.ts:1)).
- Speed tradeoff: E2E is stable-ish due single worker and API-first setup, but this under-tests real UI behavior.
- Isolation strengths: backend integration per-test DB isolation via `TestApp` is strong.
- Observability gap: no dedicated integration assertions for SSE stream semantics and scheduler outcomes.

## 6) Detailed QA / Refactoring Plan

### Phase 1 (1-2 weeks): close highest-risk deterministic gaps
- Add scheduler integration suite:
  - due schedule executes once
  - `last_run_at` blocks double-run
  - active manual job causes skip
  - email send errors are logged and do not crash loop
- Add SSE integration suite:
  - authorized subscribe receives latest event
  - unauthorized/foreign job denied
  - `complete` and `error` payload schema checks
- Add Brave Search integration path with mocked Brave API and stored encrypted key fixture.

### Phase 2 (1 week): non-functional policy tests
- Add pipeline integration tests for:
  - `max_age_days` filtering (`filtered_too_old` assertions)
  - user/provider rate-limit behavior under contention
  - cancellation mid-batch and partial-save invariants.

### Phase 3 (1 week): E2E realism upgrades
- Convert at least 3 API-heavy E2E scenarios to UI-driven workflows:
  - theme create/update/delete
  - source add/import/preferred/delete
  - schedule form save/delete.
- Keep API shortcuts only for setup/cleanup.

### Phase 4 (2-3 days): documentation and gate hardening
- Generate test inventory automatically (counts, pass/fail) and publish into QA docs.
- Split CI lanes:
  - deterministic required lane (unit/integration/mock-e2e)
  - optional live-provider lane (non-blocking).

## 7) Quick Wins

- Add one integration test for `/syntheses/generate/{job_id}/progress` happy path + ownership check.
- Add one integration test for scheduled execution `mark_run` behavior using controlled due schedule fixture.
- Add one article-history cross-user isolation test.
- Mark `generation-live.spec.ts` as non-blocking in CI with explicit label/reporting.
- Update `docs/qa_guidelines.md` inventory counts to current observed baseline.

## Execution Notes

- Ran successfully: `cd backend && cargo test --lib` -> 359 passed.
- Could not execute frontend unit tests due environment dependency issue (`@rollup/rollup-darwin-x64` missing in local `node_modules`).