ai_synth/docs/qa_guidelines.md

# QA Guidelines

## Test Inventory

| Type | Count | Status | Location |
|------|-------|--------|----------|
| Backend unit tests | 358 | All passing | `backend/src/**/*.rs` (inline `#[cfg(test)]`) |
| Backend integration tests | 183 | All passing | `backend/tests/*.rs` |
| Frontend unit tests | 141 | 131 passing, 10 failing | `frontend/src/**/*.test.{ts,tsx}` |
| E2E tests (Playwright) | 7 | All passing | `e2e/tests/*.spec.ts` |
| **Total** | **689** | | |

## Release Gate Policy

- Releases are blocked unless critical flows have deterministic CI coverage.
- Mandatory deterministic CI coverage includes:
  - Scheduler execution path (due schedule selection, run/skip behavior, `last_run_at` handling, email side effects).
  - SSE generation progress contract.
- Tests requiring external providers (for example `generation-live.spec.ts`) are non-blocking supplemental checks and must not be the only coverage for critical flows.

### Backend Unit Test Breakdown

| Source file | Tests | Coverage area |
|---|---|---|
| `services/scraper.rs` | 74 | SSRF IP checks, soft-404, redirect, HTML parsing |
| `services/synthesis.rs` | 36 | Pipeline logic, schema building, category overflow |
| `services/llm/anthropic.rs` | 20 | Response parsing, error handling |
| `services/prompts.rs` | 18 | Prompt template generation |
| `services/csv.rs` | 18 | CSV parsing, serialization |
| `models/synthesis.rs` | 16 | Model validation, serialization |
| `services/rate_limiter.rs` | 15 | Token bucket, concurrency |
| `services/llm/openai.rs` | 13 | Response parsing, error handling |
| `models/source.rs` | 12 | URL / title validation |
| `models/settings.rs` | 12 | Settings validation, defaults |
| `services/export.rs` | 12 | Markdown / PDF rendering |
| `services/llm/gemini.rs` | 10 | Response parsing, error handling |
| `models/provider.rs` | 10 | Provider / model validation |
| `services/email.rs` | 9 | Email rendering, bypass mode |
| `services/encryption.rs` | 8 | AES-256-GCM encrypt/decrypt |
| `services/source_scraper.rs` | 8 | Link extraction, is_article filter |
| `services/llm/schema.rs` | 8 | JSON schema generation |
| `util/token.rs` | 8 | Token generation, hashing |
| `models/api_key.rs` | 8 | API key validation |
| `middleware/csrf.rs` | 7 | CSRF header check |
| `models/rate_limit.rs` | 6 | Rate limit model validation |
| `config.rs` | 6 | Config parsing |
| `middleware/auth.rs` | 5 | Session extraction |
| `services/llm/factory.rs` | 5 | Provider factory |
| `handlers/admin.rs` | 4 | Admin handler validation |

### Backend Integration Test Breakdown

| File | Tests | Coverage area |
|---|---|---|
| `api_sources_test.rs` | 36 | Sources CRUD, validation, CSV, bulk import, max limit |
| `api_admin_test.rs` | 30 | Provider CRUD, rate limits, user management, audit log |
| `api_keys_test.rs` | 18 | API key CRUD, encryption, ownership, test endpoint |
| `api_syntheses_test.rs` | 17 | Synthesis CRUD, pagination, ownership, generation trigger |
| `api_auth_test.rs` | 16 | Register, login, verify, logout, session |
| `api_export_test.rs` | 13 | Email send, Markdown export, PDF export |
| `api_themes_test.rs` | 10 | Theme CRUD, validation, ownership |
| `api_schedules_test.rs` | 9 | Schedule CRUD, validation, ownership |
| `api_settings_test.rs` | 7 | Settings CRUD, defaults, boundary values |
| `pipeline_test.rs` | 6 | Phase 1 extraction, Phase 2 search, overflow, diversity, dedup, preferred |
| `api_article_history_test.rs` | 4 | History list, clear, provenance |
| `api_csrf_test.rs` | 4 | CSRF header enforcement |
| `api_stop_generation_test.rs` | 4 | Stop job, ownership, 404 |
| `api_llm_logs_test.rs` | 3 | LLM logs auth, 404, happy path |
| `api_sources_preferred_test.rs` | 3 | Preferred sources set/clear/auth |
| `minimal_test.rs` | 2 | Infrastructure sanity |
| `api_health_test.rs` | 1 | Health check |

### E2E Test Breakdown

| File | Coverage area |
|---|---|
| `registration.spec.ts` | Full magic link registration flow |
| `settings.spec.ts` | Settings persistence across reloads |
| `settings-export.spec.ts` | Settings export/import roundtrip |
| `sources.spec.ts` | Source CRUD + preferred sources via API |
| `themes.spec.ts` | Theme CRUD + schedule CRUD via API |
| `admin-providers.spec.ts` | Admin provider management, settings dropdown |
| `generation-live.spec.ts` | Full pipeline with real OpenAI key (gated on `OPENAI_TEST_API_KEY`) |

---

## Running Tests

### Backend Unit Tests

No database required:

```bash
cd backend && cargo test --lib
```

### Backend Integration Tests

Requires a running Postgres instance. Use the helper script:

```bash
./scripts/run-integration-tests.sh                          # all tests
./scripts/run-integration-tests.sh --test pipeline_test      # one test file
./scripts/run-integration-tests.sh --test api_admin_test config_providers  # one test by name
./scripts/run-integration-tests.sh --lib                     # unit tests only
./scripts/run-integration-tests.sh --db-check                # just check DB connectivity
```

The script automatically:
- Starts the test Postgres container on port 5433 (via `e2e/docker-compose.test.yml`)
- Sets `TEST_DATABASE_URL` and `SKIP_SSRF_CHECK=1`
- Runs `cargo test` with the specified arguments

Manual equivalent:

```bash
cd e2e && docker compose -f docker-compose.test.yml up -d db
cd ../backend
export TEST_DATABASE_URL=postgres://ai_synth_test:testpassword@127.0.0.1:5433/ai_synth_test
export SKIP_SSRF_CHECK=1
cargo test
```

### Frontend Unit Tests

```bash
cd frontend && npx vitest run
```

Type checking (no tests, just compiler verification):

```bash
cd frontend && npx tsc --noEmit
```

### E2E Tests (Playwright)

Use the helper script, which builds the Docker image, starts the full stack, seeds the database, and runs Playwright:

```bash
./scripts/run-e2e-tests.sh                     # all E2E tests
./scripts/run-e2e-tests.sh --headed            # with browser visible
./scripts/run-e2e-tests.sh generation-live      # specific test file
```

The script:
1. Builds the test Docker image (`docker compose -f docker-compose.test.yml build`)
2. Starts the full stack (app + Postgres)
3. Waits for the app health check to pass
4. Installs npm dependencies and Playwright browsers
5. Seeds the test database (`npx tsx seed.ts`)
6. Runs Playwright tests
7. Cleans up on exit (stops containers, removes volumes)

The `generation-live.spec.ts` test requires `OPENAI_TEST_API_KEY` to be set (in `e2e/.env.test` or environment). It is a supplemental non-blocking check and does not replace deterministic CI coverage.

---

## Test Infrastructure

### TestApp (Backend Integration Tests)

`backend/tests/common/mod.rs` provides the `TestApp` struct, which is the foundation for all integration tests.

**What it does:**
- Creates a unique temporary Postgres database per test (named `ai_synth_test_{uuid}`)
- Runs all migrations
- Builds the full Axum router with test configuration (bypassed Turnstile and Resend)
- Provides request helpers: `get`, `post`, `get_with_session`, `post_with_session`, `put_with_session`, `delete_with_session`, `raw_request_text`, `raw_request_bytes`
- Provides auth helpers: `create_test_user`, `create_authenticated_user`, `create_admin_user`, `register_user_via_api`, `create_magic_link_for_email`
- Provides `insert_test_synthesis` for creating test data without running the pipeline
- Handles cleanup via `Drop` (fire-and-forget) or explicit `cleanup().await`

**Request helpers** automatically:
- Set `Content-Type: application/json` for requests with a body
- Set `X-Requested-With: XMLHttpRequest` (CSRF header) for mutating methods (POST, PUT, DELETE, PATCH)
- Set the session cookie when `session_cookie` is provided
- Parse the response body as JSON (or return `{}` for empty bodies)

**Usage pattern:**

```rust
#[tokio::test]
async fn my_test() {
    let app = TestApp::new().await;
    let (user_id, session) = app.create_authenticated_user("user@test.com").await;

    let (status, body) = app.get_with_session("/api/v1/settings", &session).await;
    assert_eq!(status, StatusCode::OK);
    // ...assertions...

    app.cleanup().await;
}
```

### Wiremock (Pipeline Tests)

Pipeline integration tests use `wiremock` to mock HTTP responses from source websites. The mock server runs on localhost, which is why `SKIP_SSRF_CHECK=1` is required (otherwise the SSRF protection would block requests to localhost).

### MockLlmProvider

`backend/src/services/llm/mock.rs` provides a deterministic mock LLM provider for pipeline tests:

- Returns classify/summarize responses when the system prompt contains "classer" (French for "classify")
- Returns search responses with configurable URLs via `with_search_urls()`
- Uses a configurable default category via `with_default_category()`
- Identifies call types by inspecting French keywords in the system prompt

Usage:

```rust
let mock = MockLlmProvider::new()
    .with_default_category("IA")
    .with_search_urls(vec!["https://example.com/article".into()])
    .into_arc();
```

### E2E Seed Data (seed.ts)

`e2e/seed.ts` creates known test users and sessions in the database. It is idempotent (uses `ON CONFLICT DO NOTHING`):

- **Admin user**: `admin@test.local` with a known session token
- **Regular user**: `user@test.local` with a known session token
- **Gemini provider**: Enabled for the test environment

Session tokens are SHA-256 hashed before insertion (matching the backend's hashing strategy).

### E2E Auth Helpers (auth.ts)

`e2e/helpers/auth.ts` provides:

- **`loginAsAdmin(page)`**: Injects the admin session cookie.
- **`loginAsUser(page)`**: Injects the regular user session cookie.
- **`registerAndVerify(page, email)`**: Full registration flow: calls the API to register, inserts a magic link token directly in the DB, navigates to the verify URL.
- **`createDbClient()`**: Returns a `pg.Client` connected to the test database.

---

## Writing Integration Tests

### Patterns

1. **Each test gets its own `TestApp`** (and therefore its own database). Tests are fully isolated.

2. **Create users via helpers**, not via the registration API (unless testing registration):

   ```rust
   let (user_id, session) = app.create_authenticated_user("user@test.com").await;
   ```

3. **Test all access control paths** for every endpoint:
   - 401 without authentication
   - 403 for admin-only endpoints with a regular user
   - 404 for accessing another user's resources (ownership isolation)

4. **Settings payload must be complete.** The `PUT /settings` endpoint requires every field. When sending a settings update in tests, include all fields:

   ```rust
   let settings = serde_json::json!({
       "max_articles_per_source": 3,
       "max_links_per_source": 10,
       "use_brave_search": false,
       "article_history_days": 30,
       "batch_size": 5,
       "source_extraction_window": 5,
       "search_agent_behavior": "",
       "ai_provider": "gemini",
       "ai_model": "gemini-2.5-flash",
       "ai_model_websearch": "gemini-2.5-flash",
       "rate_limit_max_requests": null,
       "rate_limit_time_window_seconds": null
   });
   ```

5. **Use `post_without_csrf` to test CSRF rejection.**

6. **Use `raw_request_text` / `raw_request_bytes`** for non-JSON responses (CSV exports, PDF exports).

7. **Always call `app.cleanup().await`** at the end of the test for deterministic cleanup.

### Pipeline Tests

Pipeline integration tests in `pipeline_test.rs` use wiremock + MockLlmProvider:

1. Set up wiremock to serve a mock source page with article links
2. Set up wiremock to serve mock article pages
3. Configure user settings and sources pointing to wiremock URLs
4. Run the pipeline with `MockLlmProvider` via the `provider_override` parameter
5. Assert the resulting synthesis contains the expected categories and articles

---

## Writing E2E Tests

### Playwright Configuration

- Tests run against the Docker-composed stack on `http://localhost:8080`
- Single worker to avoid parallel DB state mutations
- Timeout: 30 seconds per test, 2 retries
- Screenshots on failure, traces on first retry
- Chromium browser only

### Patterns

1. **Use `loginAsAdmin` / `loginAsUser`** from `e2e/helpers/auth.ts` for authentication:

   ```typescript
   import { loginAsUser } from '../helpers/auth';

   test('my test', async ({ page }) => {
     await loginAsUser(page);
     await page.goto('/', { waitUntil: 'domcontentloaded' });
     // ...
   });
   ```

2. **Use `waitUntil: 'domcontentloaded'`** instead of the default `load` for `page.goto()`. This avoids waiting for external resources (Turnstile scripts, fonts) that may not load in the test environment.

3. **Prefer API-based setup over UI interactions** for test data. Use `page.evaluate()` to call the API directly:

   ```typescript
   await page.evaluate(async () => {
     await fetch('/api/v1/sources', {
       method: 'POST',
       headers: { 'Content-Type': 'application/json', 'X-Requested-With': 'XMLHttpRequest' },
       body: JSON.stringify({ title: 'Test', url: 'https://example.com', theme_id: '...' }),
     });
   });
   ```

4. **Use `createDbClient()`** from `e2e/helpers/auth.ts` when you need to verify database state directly.

5. **The `generation-live.spec.ts` test** is gated on `OPENAI_TEST_API_KEY`. Treat it as supplemental coverage only.

---

## Known Limitations

### Drop Deadlock in TestApp

The `TestApp::Drop` implementation spawns a background thread to drop the test database. **Do not call `.join()` on this thread** -- it deadlocks because the spawned thread creates a new tokio runtime whose `block_on` conflicts with the existing runtime's connection pool. The thread runs independently and cleans up asynchronously. For deterministic cleanup, use `app.cleanup().await`.

### SSRF Bypass for Integration Tests

`SKIP_SSRF_CHECK=1` is set during integration tests so that wiremock (running on localhost) is not blocked by the SSRF protection. This env var check runs at runtime, not compile time. Ensure it is never set in production.

### Flaky generation-live Test

The `generation-live.spec.ts` test depends on a real OpenAI API call. It may fail due to:
- API rate limits
- Slow responses exceeding the 30-second timeout
- Changes in model behavior affecting output format

It is configured with 2 retries to mitigate transient failures.

### Frontend Failing Tests

As of the last audit, 10 of 141 frontend unit tests are failing. Investigate with `cd frontend && npx vitest run` before adding new frontend tests.

---

## Coverage Targets and Gaps

### Well-Covered Areas

- **SSRF protection**: 74 unit tests covering all private IP ranges, IPv4-mapped IPv6, redirect blocking
- **Sources CRUD**: 36 integration tests including CSV, bulk import, max limits
- **Admin module**: 30 integration tests with access control verification
- **Encryption**: Tests verify API keys are not stored in plaintext by querying the database directly
- **Pipeline**: Uses wiremock + MockLlmProvider for deterministic end-to-end pipeline testing

### Critical Gaps

The following gaps must be addressed to satisfy the release gate policy.

| Gap | Priority | Description |
|-----|----------|-------------|
| Scheduled execution | Critical | `scheduler.rs` has zero tests. Autonomous process that generates syntheses and sends emails. |
| Brave Search pipeline | High | Only 1 unit test. The Brave Search code path in the pipeline is untested in integration. |
| Date filtering | High | No tests verify that `max_age_days` actually filters old articles. |
| Rate limiting integration | High | 15 unit tests but no integration test verifying rate limits are applied during pipeline runs. |
| SSE progress stream | High | No integration test for the SSE endpoint. Only tested in the gated E2E test. |
| Settings validation (negative) | Medium | No tests for rejection of out-of-range values (e.g., `max_articles_per_source: 0`). |
| Article history ownership | Medium | No test verifying User B cannot see User A's article history. |
| Frontend failing tests | Medium | 10 tests need investigation and fixing. |