You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

10 KiB

QA Integration/E2E Audit Report

Date: 2026-03-27
Scope: docs/requirements.md, docs/functional_specs.md, docs/technical_specs.md, docs/qa_guidelines.md, backend/tests, e2e/tests, test scripts.

1) Clarification Questions

  1. Should scheduled execution reliability (background scheduler + email fanout) be release-gated with deterministic integration tests, or only monitored in production?
  2. Is an external-provider live E2E (OPENAI_TEST_API_KEY) acceptable as the only end-to-end coverage for SSE/progress completion, or do you want deterministic in-house SSE coverage in CI?

2) Assumptions

  • CI quality gates should not rely on external LLM providers.
  • Core product requirements (scheduled generation, generation progress, fallback paths) should be covered by deterministic integration tests.
  • This report prioritizes integration/E2E confidence over unit-test volume.

3) Prioritized Findings (P0-P3)

P0 — Scheduled execution path is effectively untested (critical requirement risk)

  • Why it matters: Scheduled generation + email delivery is a core requirement. Regressions here can silently fail user deliverables.
  • Evidence:
  • Direction: Add deterministic integration tests for due schedule selection, double-run prevention (last_run_at), job contention behavior, and email send invocation outcomes.

P1 — SSE progress endpoint has no deterministic integration coverage

  • Why it matters: Generation UX and cancellation safety depend on SSE correctness.
  • Evidence:
  • Direction: Add integration tests that subscribe to /progress, assert progress -> complete/error sequence, ownership enforcement, reconnect semantics, and keepalive stability.

P1 — Brave Search fallback path lacks integration coverage

  • Why it matters: Fallback branch is a key functional path and currently high regression risk.
  • Evidence:
  • Direction: Add mock HTTP server + encrypted Brave key fixture flow to execute use_brave_search=true end-to-end in integration tests.

P1 — Pipeline integration does not verify rate-limit behavior

  • Why it matters: Rate limiting is a non-functional requirement; failures can produce outages or provider bans.
  • Evidence:
    • Pipeline tests set user rate limit fields to null (pipeline_test.rs:64).
    • No integration assertions around rate-limited waits/error propagation.
  • Direction: Add integration scenarios for strict user/provider limits and verify wait/retry/timeout outcomes.

P1 — Pipeline integration does not verify max-age article filtering behavior

  • Why it matters: Freshness is a core content-quality requirement.
  • Evidence:
    • Pipeline tests consistently use high max_age_days values (pipeline_test.rs:77).
    • No integration assertion for filtered_too_old trace behavior.
  • Direction: Add wiremock articles with old publish dates + assertions on filtering and history status.

P2 — E2E suite is heavily API-driven, limited UI journey validation

  • Why it matters: UI regressions can pass E2E while backend endpoints stay healthy.
  • Evidence:
  • Direction: Keep API-assisted setup, but assert critical user interactions through UI (form submit, validation messages, control states).

P2 — Article history ownership isolation is not explicitly tested

  • Why it matters: Multi-user data isolation is security-sensitive.
  • Evidence:
  • Direction: Add user A vs user B cross-access tests for history and provenance endpoints.

P2 — QA guidelines are out of sync with current codebase signals

  • Why it matters: stale test inventory causes false confidence in planning and release gates.
  • Evidence:
  • Direction: Automate inventory generation in CI and update docs/qa_guidelines.md from machine output.

P3 — Frontend unit test execution environment is currently brittle

  • Why it matters: slows QA feedback loop and hides regressions.
  • Evidence:
    • Local run cd frontend && npx vitest run failed due missing optional Rollup binary (@rollup/rollup-darwin-x64).
  • Direction: Add a clean install/bootstrap check in CI and pin known-good Node/npm workflow.

4) Coverage Map (Required Capability vs Current Coverage)

Capability Unit Integration E2E Status
Auth (register/login/verify/session) Medium Strong (api_auth_test.rs) Medium (registration.spec.ts) Good
Theme CRUD Low Strong (api_themes_test.rs) Medium (API-driven) Good
Source CRUD/import/export/preferred Medium Strong (api_sources_test.rs) Medium (API-driven) Good
On-demand generation trigger/duplicate/stop Medium Medium (api_syntheses_test.rs, api_stop_generation_test.rs) Medium (live test gated) Partial
SSE progress stream contract Low Weak Weak (only external live) Gap
Pipeline Phase 1 (personalized sources) Medium Medium (pipeline_test.rs) Low Partial
Pipeline Phase 2 (LLM web search) Medium Medium (pipeline_test.rs) Low Partial
Pipeline Phase 2 (Brave Search) Low None None Gap
Scheduled config CRUD Low Medium (api_schedules_test.rs) Medium (API-driven in themes E2E) Partial
Scheduled execution runtime Low None None Gap
Export email/pdf/markdown Medium Strong (api_export_test.rs) Low Good
Article history/provenance security Low Weak (no ownership isolation) None Gap
Rate limiting in real generation flow Medium None None Gap
Date freshness filtering in pipeline Medium (scraper unit) None None Gap

5) Test Architecture Issues (Flakiness / Speed / Isolation / Observability)

  • Flakiness risk: generation-live.spec.ts depends on external OpenAI availability and behavior (generation-live.spec.ts:1).
  • Speed tradeoff: E2E is stable-ish due single worker and API-first setup, but this under-tests real UI behavior.
  • Isolation strengths: backend integration per-test DB isolation via TestApp is strong.
  • Observability gap: no dedicated integration assertions for SSE stream semantics and scheduler outcomes.

6) Detailed QA / Refactoring Plan

Phase 1 (1-2 weeks): close highest-risk deterministic gaps

  • Add scheduler integration suite:
    • due schedule executes once
    • last_run_at blocks double-run
    • active manual job causes skip
    • email send errors are logged and do not crash loop
  • Add SSE integration suite:
    • authorized subscribe receives latest event
    • unauthorized/foreign job denied
    • complete and error payload schema checks
  • Add Brave Search integration path with mocked Brave API and stored encrypted key fixture.

Phase 2 (1 week): non-functional policy tests

  • Add pipeline integration tests for:
    • max_age_days filtering (filtered_too_old assertions)
    • user/provider rate-limit behavior under contention
    • cancellation mid-batch and partial-save invariants.

Phase 3 (1 week): E2E realism upgrades

  • Convert at least 3 API-heavy E2E scenarios to UI-driven workflows:
    • theme create/update/delete
    • source add/import/preferred/delete
    • schedule form save/delete.
  • Keep API shortcuts only for setup/cleanup.

Phase 4 (2-3 days): documentation and gate hardening

  • Generate test inventory automatically (counts, pass/fail) and publish into QA docs.
  • Split CI lanes:
    • deterministic required lane (unit/integration/mock-e2e)
    • optional live-provider lane (non-blocking).

7) Quick Wins

  • Add one integration test for /syntheses/generate/{job_id}/progress happy path + ownership check.
  • Add one integration test for scheduled execution mark_run behavior using controlled due schedule fixture.
  • Add one article-history cross-user isolation test.
  • Mark generation-live.spec.ts as non-blocking in CI with explicit label/reporting.
  • Update docs/qa_guidelines.md inventory counts to current observed baseline.

Execution Notes

  • Ran successfully: cd backend && cargo test --lib -> 359 passed.
  • Could not execute frontend unit tests due environment dependency issue (@rollup/rollup-darwin-x64 missing in local node_modules).