You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

10 KiB

Raw Permalink Blame History

QA Integration/E2E Audit Report

Date: 2026-03-27
Scope: docs/requirements.md, docs/functional_specs.md, docs/technical_specs.md, docs/qa_guidelines.md, backend/tests, e2e/tests, test scripts.

1) Clarification Questions

Should scheduled execution reliability (background scheduler + email fanout) be release-gated with deterministic integration tests, or only monitored in production?
Is an external-provider live E2E (OPENAI_TEST_API_KEY) acceptable as the only end-to-end coverage for SSE/progress completion, or do you want deterministic in-house SSE coverage in CI?

2) Assumptions

CI quality gates should not rely on external LLM providers.
Core product requirements (scheduled generation, generation progress, fallback paths) should be covered by deterministic integration tests.
This report prioritizes integration/E2E confidence over unit-test volume.

3) Prioritized Findings (P0-P3)

P0 — Scheduled execution path is effectively untested (critical requirement risk)

Why it matters: Scheduled generation + email delivery is a core requirement. Regressions here can silently fail user deliverables.
Evidence:
- Scheduler runtime logic exists in scheduler.rs:27 through scheduler.rs:91.
- Only one trivial scheduler unit test exists (scheduler.rs:98).
- Requirement explicitly expects scheduled reliability (requirements.md:136).
Direction: Add deterministic integration tests for due schedule selection, double-run prevention (last_run_at), job contention behavior, and email send invocation outcomes.

P1 — SSE progress endpoint has no deterministic integration coverage

Why it matters: Generation UX and cancellation safety depend on SSE correctness.
Evidence:
- SSE handler is implemented in generation.rs:150.
- Integration tests validate trigger/duplicate behavior only, not progress stream contract (api_syntheses_test.rs:565, api_syntheses_test.rs:617).
- Existing live SSE check is gated and external (generation-live.spec.ts:119).
Direction: Add integration tests that subscribe to /progress, assert progress -> complete/error sequence, ownership enforcement, reconnect semantics, and keepalive stability.

P1 — Brave Search fallback path lacks integration coverage

Why it matters: Fallback branch is a key functional path and currently high regression risk.
Evidence:
- Brave branch in pipeline code: synthesis.rs:371.
- Pipeline test file explicitly states this path is not integration-tested (pipeline_test.rs:257).
Direction: Add mock HTTP server + encrypted Brave key fixture flow to execute use_brave_search=true end-to-end in integration tests.

P1 — Pipeline integration does not verify rate-limit behavior

Why it matters: Rate limiting is a non-functional requirement; failures can produce outages or provider bans.
Evidence:
- Pipeline tests set user rate limit fields to null (pipeline_test.rs:64).
- No integration assertions around rate-limited waits/error propagation.
Direction: Add integration scenarios for strict user/provider limits and verify wait/retry/timeout outcomes.

P1 — Pipeline integration does not verify max-age article filtering behavior

Why it matters: Freshness is a core content-quality requirement.
Evidence:
- Pipeline tests consistently use high max_age_days values (pipeline_test.rs:77).
- No integration assertion for filtered_too_old trace behavior.
Direction: Add wiremock articles with old publish dates + assertions on filtering and history status.

P2 — E2E suite is heavily API-driven, limited UI journey validation

Why it matters: UI regressions can pass E2E while backend endpoints stay healthy.
Evidence:
- Sources and themes E2E use page.evaluate(fetch(...)) for most operations (sources.spec.ts:23, themes.spec.ts:29).
Direction: Keep API-assisted setup, but assert critical user interactions through UI (form submit, validation messages, control states).

P2 — Article history ownership isolation is not explicitly tested

Why it matters: Multi-user data isolation is security-sensitive.
Evidence:
- Current article history integration tests cover auth + empty/clear/provenance 404 only (api_article_history_test.rs:24).
Direction: Add user A vs user B cross-access tests for history and provenance endpoints.

P2 — QA guidelines are out of sync with current codebase signals

Why it matters: stale test inventory causes false confidence in planning and release gates.
Evidence:
- Documented counts/status in qa_guidelines.md:7 to qa_guidelines.md:11.
- Current local grep counts: backend unit ~359, backend integration ~187, frontend unit ~135 tests, E2E 7.
Direction: Automate inventory generation in CI and update docs/qa_guidelines.md from machine output.

P3 — Frontend unit test execution environment is currently brittle

Why it matters: slows QA feedback loop and hides regressions.
Evidence:
- Local run cd frontend && npx vitest run failed due missing optional Rollup binary (@rollup/rollup-darwin-x64).
Direction: Add a clean install/bootstrap check in CI and pin known-good Node/npm workflow.

4) Coverage Map (Required Capability vs Current Coverage)

Capability	Unit	Integration	E2E	Status
Auth (register/login/verify/session)	Medium	Strong (`api_auth_test.rs`)	Medium (`registration.spec.ts`)	Good
Theme CRUD	Low	Strong (`api_themes_test.rs`)	Medium (API-driven)	Good
Source CRUD/import/export/preferred	Medium	Strong (`api_sources_test.rs`)	Medium (API-driven)	Good
On-demand generation trigger/duplicate/stop	Medium	Medium (`api_syntheses_test.rs`, `api_stop_generation_test.rs`)	Medium (live test gated)	Partial
SSE progress stream contract	Low	Weak	Weak (only external live)	Gap
Pipeline Phase 1 (personalized sources)	Medium	Medium (`pipeline_test.rs`)	Low	Partial
Pipeline Phase 2 (LLM web search)	Medium	Medium (`pipeline_test.rs`)	Low	Partial
Pipeline Phase 2 (Brave Search)	Low	None	None	Gap
Scheduled config CRUD	Low	Medium (`api_schedules_test.rs`)	Medium (API-driven in themes E2E)	Partial
Scheduled execution runtime	Low	None	None	Gap
Export email/pdf/markdown	Medium	Strong (`api_export_test.rs`)	Low	Good
Article history/provenance security	Low	Weak (no ownership isolation)	None	Gap
Rate limiting in real generation flow	Medium	None	None	Gap
Date freshness filtering in pipeline	Medium (scraper unit)	None	None	Gap

5) Test Architecture Issues (Flakiness / Speed / Isolation / Observability)

Flakiness risk: generation-live.spec.ts depends on external OpenAI availability and behavior (generation-live.spec.ts:1).
Speed tradeoff: E2E is stable-ish due single worker and API-first setup, but this under-tests real UI behavior.
Isolation strengths: backend integration per-test DB isolation via TestApp is strong.
Observability gap: no dedicated integration assertions for SSE stream semantics and scheduler outcomes.

6) Detailed QA / Refactoring Plan

Phase 1 (1-2 weeks): close highest-risk deterministic gaps

Add scheduler integration suite:
- due schedule executes once
- last_run_at blocks double-run
- active manual job causes skip
- email send errors are logged and do not crash loop
Add SSE integration suite:
- authorized subscribe receives latest event
- unauthorized/foreign job denied
- complete and error payload schema checks
Add Brave Search integration path with mocked Brave API and stored encrypted key fixture.

Phase 2 (1 week): non-functional policy tests

Add pipeline integration tests for:
- max_age_days filtering (filtered_too_old assertions)
- user/provider rate-limit behavior under contention
- cancellation mid-batch and partial-save invariants.

Phase 3 (1 week): E2E realism upgrades

Convert at least 3 API-heavy E2E scenarios to UI-driven workflows:
- theme create/update/delete
- source add/import/preferred/delete
- schedule form save/delete.
Keep API shortcuts only for setup/cleanup.

Phase 4 (2-3 days): documentation and gate hardening

Generate test inventory automatically (counts, pass/fail) and publish into QA docs.
Split CI lanes:
- deterministic required lane (unit/integration/mock-e2e)
- optional live-provider lane (non-blocking).

7) Quick Wins

Add one integration test for /syntheses/generate/{job_id}/progress happy path + ownership check.
Add one integration test for scheduled execution mark_run behavior using controlled due schedule fixture.
Add one article-history cross-user isolation test.
Mark generation-live.spec.ts as non-blocking in CI with explicit label/reporting.
Update docs/qa_guidelines.md inventory counts to current observed baseline.

Execution Notes

Ran successfully: cd backend && cargo test --lib -> 359 passed.
Could not execute frontend unit tests due environment dependency issue (@rollup/rollup-darwin-x64 missing in local node_modules).

10 KiB Raw Permalink Blame History