You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
ai_synth/docs/superpowers/specs/2026-03-23-generation-uat-d...

3.7 KiB

Design: Synthesis Generation UAT with Real OpenAI API

Date: 2026-03-23 Scope: Automated Playwright E2E test that exercises the full generation pipeline with a real OpenAI API key


Context

The generation pipeline has never been tested end-to-end with a real LLM API call. Unit and integration tests use fake keys or mock the LLM layer. Runtime bugs (wrong table name, missing schema fields) were only discovered during manual Docker testing. A UAT with a real API key validates the entire pipeline: settings, key encryption, model resolution, LLM call, response parsing, and synthesis storage.

Approach

Add a single Playwright spec (e2e/tests/generation-live.spec.ts) that calls the backend API directly with a real OpenAI key. The test is gated behind a .env.test file — it skips when no key is available, so it doesn't break CI or other developers.

Files

  • Create: e2e/tests/generation-live.spec.ts — the test
  • Create: e2e/.env.test.example — template showing required vars
  • Modify: e2e/.gitignore — add .env.test

No changes to existing tests, docker-compose, or Playwright config.

Test Flow

  1. Load OPENAI_TEST_API_KEY from e2e/.env.test via dotenv. Skip if missing.
  2. Override test timeout to 180s (test.setTimeout(180_000)) — real LLM calls take 30-120s.
  3. Login as seeded user via loginAsUser() helper (sets session cookie on the page context).
  4. Ensure OpenAI provider is enabled — page.request.fetch PUT to admin providers API, or use the seeded admin user.
  5. PUT /api/v1/settings with all required fields:
    {
      "theme": "AI Weekly",
      "max_age_days": 7,
      "categories": ["AI News"],
      "max_items_per_category": 5,
      "search_agent_behavior": "",
      "ai_provider": "openai",
      "ai_model": "gpt-4o-mini",
      "ai_model_writing": "gpt-4o-mini"
    }
    
  6. POST /api/v1/user/api-keys — store the real OpenAI key (provider: "openai").
  7. POST /api/v1/sources — add a source (e.g., https://openai.com/blog).
  8. POST /api/v1/syntheses/generate — trigger generation, get job_id.
  9. Consume SSE stream via page.evaluate() using EventSource in the browser context (where the session cookie is available). Wait for complete event, JSON-parse the data field to extract synthesis_id. Timeout: 120s.
  10. GET /api/v1/syntheses/:synthesis_id — fetch the full synthesis.
  11. Validate structure and content (see below).

CSRF: All PUT/POST/DELETE calls must include X-Requested-With: XMLHttpRequest header. Use page.evaluate() with fetch() in the browser context (which shares the session cookie), or use Playwright's request fixture with explicit cookie and header management.

Uses gpt-4o-mini to keep cost under $0.01 per run.

Validation Assertions

  • Synthesis has status "completed"
  • At least 1 section exists
  • Each section has a title field (the category name) matching configured categories
  • Each section has items array with at least 1 entry
  • Each item has:
    • title: non-empty string
    • url: starts with "http"
    • summary: string with length > 50 characters

No assertion on content quality — only structural integrity and non-trivial output.

Gating

  • .env.test is gitignored — never committed
  • .env.test.example is committed as a template: OPENAI_TEST_API_KEY=sk-your-key-here
  • The test uses test.skip() if the env var is not set
  • Existing npx playwright test (without .env.test) continues to work unchanged — this test simply skips

What does NOT change

  • Existing E2E tests and their docker-compose
  • Playwright config (the new spec runs alongside existing specs)
  • Backend code — no changes
  • Frontend code — no changes