You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
152 lines
10 KiB
Markdown
152 lines
10 KiB
Markdown
# QA Integration/E2E Audit Report
|
|
|
|
Date: 2026-03-27
|
|
Scope: `docs/requirements.md`, `docs/functional_specs.md`, `docs/technical_specs.md`, `docs/qa_guidelines.md`, `backend/tests`, `e2e/tests`, test scripts.
|
|
|
|
## 1) Clarification Questions
|
|
|
|
1. Should scheduled execution reliability (background scheduler + email fanout) be release-gated with deterministic integration tests, or only monitored in production?
|
|
2. Is an external-provider live E2E (`OPENAI_TEST_API_KEY`) acceptable as the *only* end-to-end coverage for SSE/progress completion, or do you want deterministic in-house SSE coverage in CI?
|
|
|
|
## 2) Assumptions
|
|
|
|
- CI quality gates should not rely on external LLM providers.
|
|
- Core product requirements (scheduled generation, generation progress, fallback paths) should be covered by deterministic integration tests.
|
|
- This report prioritizes integration/E2E confidence over unit-test volume.
|
|
|
|
## 3) Prioritized Findings (P0-P3)
|
|
|
|
### P0 — Scheduled execution path is effectively untested (critical requirement risk)
|
|
- Why it matters: Scheduled generation + email delivery is a core requirement. Regressions here can silently fail user deliverables.
|
|
- Evidence:
|
|
- Scheduler runtime logic exists in [scheduler.rs:27](/Users/oabrivard/Projects/rust/ai_synth/backend/src/services/scheduler.rs:27) through [scheduler.rs:91](/Users/oabrivard/Projects/rust/ai_synth/backend/src/services/scheduler.rs:91).
|
|
- Only one trivial scheduler unit test exists ([scheduler.rs:98](/Users/oabrivard/Projects/rust/ai_synth/backend/src/services/scheduler.rs:98)).
|
|
- Requirement explicitly expects scheduled reliability ([requirements.md:136](/Users/oabrivard/Projects/rust/ai_synth/docs/requirements.md:136)).
|
|
- Direction: Add deterministic integration tests for due schedule selection, double-run prevention (`last_run_at`), job contention behavior, and email send invocation outcomes.
|
|
|
|
### P1 — SSE progress endpoint has no deterministic integration coverage
|
|
- Why it matters: Generation UX and cancellation safety depend on SSE correctness.
|
|
- Evidence:
|
|
- SSE handler is implemented in [generation.rs:150](/Users/oabrivard/Projects/rust/ai_synth/backend/src/handlers/generation.rs:150).
|
|
- Integration tests validate trigger/duplicate behavior only, not progress stream contract ([api_syntheses_test.rs:565](/Users/oabrivard/Projects/rust/ai_synth/backend/tests/api_syntheses_test.rs:565), [api_syntheses_test.rs:617](/Users/oabrivard/Projects/rust/ai_synth/backend/tests/api_syntheses_test.rs:617)).
|
|
- Existing live SSE check is gated and external ([generation-live.spec.ts:119](/Users/oabrivard/Projects/rust/ai_synth/e2e/tests/generation-live.spec.ts:119)).
|
|
- Direction: Add integration tests that subscribe to `/progress`, assert `progress -> complete/error` sequence, ownership enforcement, reconnect semantics, and keepalive stability.
|
|
|
|
### P1 — Brave Search fallback path lacks integration coverage
|
|
- Why it matters: Fallback branch is a key functional path and currently high regression risk.
|
|
- Evidence:
|
|
- Brave branch in pipeline code: [synthesis.rs:371](/Users/oabrivard/Projects/rust/ai_synth/backend/src/services/synthesis.rs:371).
|
|
- Pipeline test file explicitly states this path is not integration-tested ([pipeline_test.rs:257](/Users/oabrivard/Projects/rust/ai_synth/backend/tests/pipeline_test.rs:257)).
|
|
- Direction: Add mock HTTP server + encrypted Brave key fixture flow to execute `use_brave_search=true` end-to-end in integration tests.
|
|
|
|
### P1 — Pipeline integration does not verify rate-limit behavior
|
|
- Why it matters: Rate limiting is a non-functional requirement; failures can produce outages or provider bans.
|
|
- Evidence:
|
|
- Pipeline tests set user rate limit fields to `null` ([pipeline_test.rs:64](/Users/oabrivard/Projects/rust/ai_synth/backend/tests/pipeline_test.rs:64)).
|
|
- No integration assertions around rate-limited waits/error propagation.
|
|
- Direction: Add integration scenarios for strict user/provider limits and verify wait/retry/timeout outcomes.
|
|
|
|
### P1 — Pipeline integration does not verify max-age article filtering behavior
|
|
- Why it matters: Freshness is a core content-quality requirement.
|
|
- Evidence:
|
|
- Pipeline tests consistently use high `max_age_days` values ([pipeline_test.rs:77](/Users/oabrivard/Projects/rust/ai_synth/backend/tests/pipeline_test.rs:77)).
|
|
- No integration assertion for `filtered_too_old` trace behavior.
|
|
- Direction: Add wiremock articles with old publish dates + assertions on filtering and history status.
|
|
|
|
### P2 — E2E suite is heavily API-driven, limited UI journey validation
|
|
- Why it matters: UI regressions can pass E2E while backend endpoints stay healthy.
|
|
- Evidence:
|
|
- Sources and themes E2E use `page.evaluate(fetch(...))` for most operations ([sources.spec.ts:23](/Users/oabrivard/Projects/rust/ai_synth/e2e/tests/sources.spec.ts:23), [themes.spec.ts:29](/Users/oabrivard/Projects/rust/ai_synth/e2e/tests/themes.spec.ts:29)).
|
|
- Direction: Keep API-assisted setup, but assert critical user interactions through UI (form submit, validation messages, control states).
|
|
|
|
### P2 — Article history ownership isolation is not explicitly tested
|
|
- Why it matters: Multi-user data isolation is security-sensitive.
|
|
- Evidence:
|
|
- Current article history integration tests cover auth + empty/clear/provenance 404 only ([api_article_history_test.rs:24](/Users/oabrivard/Projects/rust/ai_synth/backend/tests/api_article_history_test.rs:24)).
|
|
- Direction: Add user A vs user B cross-access tests for history and provenance endpoints.
|
|
|
|
### P2 — QA guidelines are out of sync with current codebase signals
|
|
- Why it matters: stale test inventory causes false confidence in planning and release gates.
|
|
- Evidence:
|
|
- Documented counts/status in [qa_guidelines.md:7](/Users/oabrivard/Projects/rust/ai_synth/docs/qa_guidelines.md:7) to [qa_guidelines.md:11](/Users/oabrivard/Projects/rust/ai_synth/docs/qa_guidelines.md:11).
|
|
- Current local grep counts: backend unit ~359, backend integration ~187, frontend unit ~135 tests, E2E 7.
|
|
- Direction: Automate inventory generation in CI and update `docs/qa_guidelines.md` from machine output.
|
|
|
|
### P3 — Frontend unit test execution environment is currently brittle
|
|
- Why it matters: slows QA feedback loop and hides regressions.
|
|
- Evidence:
|
|
- Local run `cd frontend && npx vitest run` failed due missing optional Rollup binary (`@rollup/rollup-darwin-x64`).
|
|
- Direction: Add a clean install/bootstrap check in CI and pin known-good Node/npm workflow.
|
|
|
|
## 4) Coverage Map (Required Capability vs Current Coverage)
|
|
|
|
| Capability | Unit | Integration | E2E | Status |
|
|
|---|---|---|---|---|
|
|
| Auth (register/login/verify/session) | Medium | Strong (`api_auth_test.rs`) | Medium (`registration.spec.ts`) | Good |
|
|
| Theme CRUD | Low | Strong (`api_themes_test.rs`) | Medium (API-driven) | Good |
|
|
| Source CRUD/import/export/preferred | Medium | Strong (`api_sources_test.rs`) | Medium (API-driven) | Good |
|
|
| On-demand generation trigger/duplicate/stop | Medium | Medium (`api_syntheses_test.rs`, `api_stop_generation_test.rs`) | Medium (live test gated) | Partial |
|
|
| SSE progress stream contract | Low | Weak | Weak (only external live) | Gap |
|
|
| Pipeline Phase 1 (personalized sources) | Medium | Medium (`pipeline_test.rs`) | Low | Partial |
|
|
| Pipeline Phase 2 (LLM web search) | Medium | Medium (`pipeline_test.rs`) | Low | Partial |
|
|
| Pipeline Phase 2 (Brave Search) | Low | None | None | Gap |
|
|
| Scheduled config CRUD | Low | Medium (`api_schedules_test.rs`) | Medium (API-driven in themes E2E) | Partial |
|
|
| Scheduled execution runtime | Low | None | None | Gap |
|
|
| Export email/pdf/markdown | Medium | Strong (`api_export_test.rs`) | Low | Good |
|
|
| Article history/provenance security | Low | Weak (no ownership isolation) | None | Gap |
|
|
| Rate limiting in real generation flow | Medium | None | None | Gap |
|
|
| Date freshness filtering in pipeline | Medium (scraper unit) | None | None | Gap |
|
|
|
|
## 5) Test Architecture Issues (Flakiness / Speed / Isolation / Observability)
|
|
|
|
- Flakiness risk: `generation-live.spec.ts` depends on external OpenAI availability and behavior ([generation-live.spec.ts:1](/Users/oabrivard/Projects/rust/ai_synth/e2e/tests/generation-live.spec.ts:1)).
|
|
- Speed tradeoff: E2E is stable-ish due single worker and API-first setup, but this under-tests real UI behavior.
|
|
- Isolation strengths: backend integration per-test DB isolation via `TestApp` is strong.
|
|
- Observability gap: no dedicated integration assertions for SSE stream semantics and scheduler outcomes.
|
|
|
|
## 6) Detailed QA / Refactoring Plan
|
|
|
|
### Phase 1 (1-2 weeks): close highest-risk deterministic gaps
|
|
- Add scheduler integration suite:
|
|
- due schedule executes once
|
|
- `last_run_at` blocks double-run
|
|
- active manual job causes skip
|
|
- email send errors are logged and do not crash loop
|
|
- Add SSE integration suite:
|
|
- authorized subscribe receives latest event
|
|
- unauthorized/foreign job denied
|
|
- `complete` and `error` payload schema checks
|
|
- Add Brave Search integration path with mocked Brave API and stored encrypted key fixture.
|
|
|
|
### Phase 2 (1 week): non-functional policy tests
|
|
- Add pipeline integration tests for:
|
|
- `max_age_days` filtering (`filtered_too_old` assertions)
|
|
- user/provider rate-limit behavior under contention
|
|
- cancellation mid-batch and partial-save invariants.
|
|
|
|
### Phase 3 (1 week): E2E realism upgrades
|
|
- Convert at least 3 API-heavy E2E scenarios to UI-driven workflows:
|
|
- theme create/update/delete
|
|
- source add/import/preferred/delete
|
|
- schedule form save/delete.
|
|
- Keep API shortcuts only for setup/cleanup.
|
|
|
|
### Phase 4 (2-3 days): documentation and gate hardening
|
|
- Generate test inventory automatically (counts, pass/fail) and publish into QA docs.
|
|
- Split CI lanes:
|
|
- deterministic required lane (unit/integration/mock-e2e)
|
|
- optional live-provider lane (non-blocking).
|
|
|
|
## 7) Quick Wins
|
|
|
|
- Add one integration test for `/syntheses/generate/{job_id}/progress` happy path + ownership check.
|
|
- Add one integration test for scheduled execution `mark_run` behavior using controlled due schedule fixture.
|
|
- Add one article-history cross-user isolation test.
|
|
- Mark `generation-live.spec.ts` as non-blocking in CI with explicit label/reporting.
|
|
- Update `docs/qa_guidelines.md` inventory counts to current observed baseline.
|
|
|
|
## Execution Notes
|
|
|
|
- Ran successfully: `cd backend && cargo test --lib` -> 359 passed.
|
|
- Could not execute frontend unit tests due environment dependency issue (`@rollup/rollup-darwin-x64` missing in local `node_modules`).
|