ai_synth/docs/superpowers/specs/2026-03-23-generation-uat-d...

# Design: Synthesis Generation UAT with Real OpenAI API

**Date**: 2026-03-23
**Scope**: Automated Playwright E2E test that exercises the full generation pipeline with a real OpenAI API key

---

## Context

The generation pipeline has never been tested end-to-end with a real LLM API call. Unit and integration tests use fake keys or mock the LLM layer. Runtime bugs (wrong table name, missing schema fields) were only discovered during manual Docker testing. A UAT with a real API key validates the entire pipeline: settings, key encryption, model resolution, LLM call, response parsing, and synthesis storage.

## Approach

Add a single Playwright spec (`e2e/tests/generation-live.spec.ts`) that calls the backend API directly with a real OpenAI key. The test is gated behind a `.env.test` file — it skips when no key is available, so it doesn't break CI or other developers.

## Files

- **Create:** `e2e/tests/generation-live.spec.ts` — the test
- **Create:** `e2e/.env.test.example` — template showing required vars
- **Modify:** `e2e/.gitignore` — add `.env.test`

No changes to existing tests, docker-compose, or Playwright config.

## Test Flow

1. Load `OPENAI_TEST_API_KEY` from `e2e/.env.test` via `dotenv`. Skip if missing.
2. Override test timeout to 180s (`test.setTimeout(180_000)`) — real LLM calls take 30-120s.
3. Login as seeded user via `loginAsUser()` helper (sets session cookie on the page context).
4. Ensure OpenAI provider is enabled — `page.request.fetch` PUT to admin providers API, or use the seeded admin user.
5. PUT `/api/v1/settings` with **all required fields**:
   ```json
   {
     "theme": "AI Weekly",
     "max_age_days": 7,
     "categories": ["AI News"],
     "max_items_per_category": 5,
     "search_agent_behavior": "",
     "ai_provider": "openai",
     "ai_model": "gpt-4o-mini",
     "ai_model_writing": "gpt-4o-mini"
   }
   ```
6. POST `/api/v1/user/api-keys` — store the real OpenAI key (provider: `"openai"`).
7. POST `/api/v1/sources` — add a source (e.g., `https://openai.com/blog`).
8. POST `/api/v1/syntheses/generate` — trigger generation, get `job_id`.
9. Consume SSE stream via `page.evaluate()` using `EventSource` in the browser context (where the session cookie is available). Wait for `complete` event, JSON-parse the `data` field to extract `synthesis_id`. Timeout: 120s.
10. GET `/api/v1/syntheses/:synthesis_id` — fetch the full synthesis.
11. Validate structure and content (see below).

**CSRF**: All PUT/POST/DELETE calls must include `X-Requested-With: XMLHttpRequest` header. Use `page.evaluate()` with `fetch()` in the browser context (which shares the session cookie), or use Playwright's `request` fixture with explicit cookie and header management.

Uses `gpt-4o-mini` to keep cost under $0.01 per run.

## Validation Assertions

- Synthesis has status `"completed"`
- At least 1 section exists
- Each section has a `title` field (the category name) matching configured categories
- Each section has `items` array with at least 1 entry
- Each item has:
  - `title`: non-empty string
  - `url`: starts with `"http"`
  - `summary`: string with length > 50 characters

No assertion on content quality — only structural integrity and non-trivial output.

## Gating

- `.env.test` is gitignored — never committed
- `.env.test.example` is committed as a template: `OPENAI_TEST_API_KEY=sk-your-key-here`
- The test uses `test.skip()` if the env var is not set
- Existing `npx playwright test` (without `.env.test`) continues to work unchanged — this test simply skips

## What does NOT change

- Existing E2E tests and their docker-compose
- Playwright config (the new spec runs alongside existing specs)
- Backend code — no changes
- Frontend code — no changes