You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
10 KiB
10 KiB
QA Integration/E2E Audit Report
Date: 2026-03-27
Scope: docs/requirements.md, docs/functional_specs.md, docs/technical_specs.md, docs/qa_guidelines.md, backend/tests, e2e/tests, test scripts.
1) Clarification Questions
- Should scheduled execution reliability (background scheduler + email fanout) be release-gated with deterministic integration tests, or only monitored in production?
- Is an external-provider live E2E (
OPENAI_TEST_API_KEY) acceptable as the only end-to-end coverage for SSE/progress completion, or do you want deterministic in-house SSE coverage in CI?
2) Assumptions
- CI quality gates should not rely on external LLM providers.
- Core product requirements (scheduled generation, generation progress, fallback paths) should be covered by deterministic integration tests.
- This report prioritizes integration/E2E confidence over unit-test volume.
3) Prioritized Findings (P0-P3)
P0 — Scheduled execution path is effectively untested (critical requirement risk)
- Why it matters: Scheduled generation + email delivery is a core requirement. Regressions here can silently fail user deliverables.
- Evidence:
- Scheduler runtime logic exists in scheduler.rs:27 through scheduler.rs:91.
- Only one trivial scheduler unit test exists (scheduler.rs:98).
- Requirement explicitly expects scheduled reliability (requirements.md:136).
- Direction: Add deterministic integration tests for due schedule selection, double-run prevention (
last_run_at), job contention behavior, and email send invocation outcomes.
P1 — SSE progress endpoint has no deterministic integration coverage
- Why it matters: Generation UX and cancellation safety depend on SSE correctness.
- Evidence:
- SSE handler is implemented in generation.rs:150.
- Integration tests validate trigger/duplicate behavior only, not progress stream contract (api_syntheses_test.rs:565, api_syntheses_test.rs:617).
- Existing live SSE check is gated and external (generation-live.spec.ts:119).
- Direction: Add integration tests that subscribe to
/progress, assertprogress -> complete/errorsequence, ownership enforcement, reconnect semantics, and keepalive stability.
P1 — Brave Search fallback path lacks integration coverage
- Why it matters: Fallback branch is a key functional path and currently high regression risk.
- Evidence:
- Brave branch in pipeline code: synthesis.rs:371.
- Pipeline test file explicitly states this path is not integration-tested (pipeline_test.rs:257).
- Direction: Add mock HTTP server + encrypted Brave key fixture flow to execute
use_brave_search=trueend-to-end in integration tests.
P1 — Pipeline integration does not verify rate-limit behavior
- Why it matters: Rate limiting is a non-functional requirement; failures can produce outages or provider bans.
- Evidence:
- Pipeline tests set user rate limit fields to
null(pipeline_test.rs:64). - No integration assertions around rate-limited waits/error propagation.
- Pipeline tests set user rate limit fields to
- Direction: Add integration scenarios for strict user/provider limits and verify wait/retry/timeout outcomes.
P1 — Pipeline integration does not verify max-age article filtering behavior
- Why it matters: Freshness is a core content-quality requirement.
- Evidence:
- Pipeline tests consistently use high
max_age_daysvalues (pipeline_test.rs:77). - No integration assertion for
filtered_too_oldtrace behavior.
- Pipeline tests consistently use high
- Direction: Add wiremock articles with old publish dates + assertions on filtering and history status.
P2 — E2E suite is heavily API-driven, limited UI journey validation
- Why it matters: UI regressions can pass E2E while backend endpoints stay healthy.
- Evidence:
- Sources and themes E2E use
page.evaluate(fetch(...))for most operations (sources.spec.ts:23, themes.spec.ts:29).
- Sources and themes E2E use
- Direction: Keep API-assisted setup, but assert critical user interactions through UI (form submit, validation messages, control states).
P2 — Article history ownership isolation is not explicitly tested
- Why it matters: Multi-user data isolation is security-sensitive.
- Evidence:
- Current article history integration tests cover auth + empty/clear/provenance 404 only (api_article_history_test.rs:24).
- Direction: Add user A vs user B cross-access tests for history and provenance endpoints.
P2 — QA guidelines are out of sync with current codebase signals
- Why it matters: stale test inventory causes false confidence in planning and release gates.
- Evidence:
- Documented counts/status in qa_guidelines.md:7 to qa_guidelines.md:11.
- Current local grep counts: backend unit ~359, backend integration ~187, frontend unit ~135 tests, E2E 7.
- Direction: Automate inventory generation in CI and update
docs/qa_guidelines.mdfrom machine output.
P3 — Frontend unit test execution environment is currently brittle
- Why it matters: slows QA feedback loop and hides regressions.
- Evidence:
- Local run
cd frontend && npx vitest runfailed due missing optional Rollup binary (@rollup/rollup-darwin-x64).
- Local run
- Direction: Add a clean install/bootstrap check in CI and pin known-good Node/npm workflow.
4) Coverage Map (Required Capability vs Current Coverage)
| Capability | Unit | Integration | E2E | Status |
|---|---|---|---|---|
| Auth (register/login/verify/session) | Medium | Strong (api_auth_test.rs) |
Medium (registration.spec.ts) |
Good |
| Theme CRUD | Low | Strong (api_themes_test.rs) |
Medium (API-driven) | Good |
| Source CRUD/import/export/preferred | Medium | Strong (api_sources_test.rs) |
Medium (API-driven) | Good |
| On-demand generation trigger/duplicate/stop | Medium | Medium (api_syntheses_test.rs, api_stop_generation_test.rs) |
Medium (live test gated) | Partial |
| SSE progress stream contract | Low | Weak | Weak (only external live) | Gap |
| Pipeline Phase 1 (personalized sources) | Medium | Medium (pipeline_test.rs) |
Low | Partial |
| Pipeline Phase 2 (LLM web search) | Medium | Medium (pipeline_test.rs) |
Low | Partial |
| Pipeline Phase 2 (Brave Search) | Low | None | None | Gap |
| Scheduled config CRUD | Low | Medium (api_schedules_test.rs) |
Medium (API-driven in themes E2E) | Partial |
| Scheduled execution runtime | Low | None | None | Gap |
| Export email/pdf/markdown | Medium | Strong (api_export_test.rs) |
Low | Good |
| Article history/provenance security | Low | Weak (no ownership isolation) | None | Gap |
| Rate limiting in real generation flow | Medium | None | None | Gap |
| Date freshness filtering in pipeline | Medium (scraper unit) | None | None | Gap |
5) Test Architecture Issues (Flakiness / Speed / Isolation / Observability)
- Flakiness risk:
generation-live.spec.tsdepends on external OpenAI availability and behavior (generation-live.spec.ts:1). - Speed tradeoff: E2E is stable-ish due single worker and API-first setup, but this under-tests real UI behavior.
- Isolation strengths: backend integration per-test DB isolation via
TestAppis strong. - Observability gap: no dedicated integration assertions for SSE stream semantics and scheduler outcomes.
6) Detailed QA / Refactoring Plan
Phase 1 (1-2 weeks): close highest-risk deterministic gaps
- Add scheduler integration suite:
- due schedule executes once
last_run_atblocks double-run- active manual job causes skip
- email send errors are logged and do not crash loop
- Add SSE integration suite:
- authorized subscribe receives latest event
- unauthorized/foreign job denied
completeanderrorpayload schema checks
- Add Brave Search integration path with mocked Brave API and stored encrypted key fixture.
Phase 2 (1 week): non-functional policy tests
- Add pipeline integration tests for:
max_age_daysfiltering (filtered_too_oldassertions)- user/provider rate-limit behavior under contention
- cancellation mid-batch and partial-save invariants.
Phase 3 (1 week): E2E realism upgrades
- Convert at least 3 API-heavy E2E scenarios to UI-driven workflows:
- theme create/update/delete
- source add/import/preferred/delete
- schedule form save/delete.
- Keep API shortcuts only for setup/cleanup.
Phase 4 (2-3 days): documentation and gate hardening
- Generate test inventory automatically (counts, pass/fail) and publish into QA docs.
- Split CI lanes:
- deterministic required lane (unit/integration/mock-e2e)
- optional live-provider lane (non-blocking).
7) Quick Wins
- Add one integration test for
/syntheses/generate/{job_id}/progresshappy path + ownership check. - Add one integration test for scheduled execution
mark_runbehavior using controlled due schedule fixture. - Add one article-history cross-user isolation test.
- Mark
generation-live.spec.tsas non-blocking in CI with explicit label/reporting. - Update
docs/qa_guidelines.mdinventory counts to current observed baseline.
Execution Notes
- Ran successfully:
cd backend && cargo test --lib-> 359 passed. - Could not execute frontend unit tests due environment dependency issue (
@rollup/rollup-darwin-x64missing in localnode_modules).