3.7 KiB
Design: Synthesis Generation UAT with Real OpenAI API
Date: 2026-03-23 Scope: Automated Playwright E2E test that exercises the full generation pipeline with a real OpenAI API key
Context
The generation pipeline has never been tested end-to-end with a real LLM API call. Unit and integration tests use fake keys or mock the LLM layer. Runtime bugs (wrong table name, missing schema fields) were only discovered during manual Docker testing. A UAT with a real API key validates the entire pipeline: settings, key encryption, model resolution, LLM call, response parsing, and synthesis storage.
Approach
Add a single Playwright spec (e2e/tests/generation-live.spec.ts) that calls the backend API directly with a real OpenAI key. The test is gated behind a .env.test file — it skips when no key is available, so it doesn't break CI or other developers.
Files
- Create:
e2e/tests/generation-live.spec.ts— the test - Create:
e2e/.env.test.example— template showing required vars - Modify:
e2e/.gitignore— add.env.test
No changes to existing tests, docker-compose, or Playwright config.
Test Flow
- Load
OPENAI_TEST_API_KEYfrome2e/.env.testviadotenv. Skip if missing. - Override test timeout to 180s (
test.setTimeout(180_000)) — real LLM calls take 30-120s. - Login as seeded user via
loginAsUser()helper (sets session cookie on the page context). - Ensure OpenAI provider is enabled —
page.request.fetchPUT to admin providers API, or use the seeded admin user. - PUT
/api/v1/settingswith all required fields:{ "theme": "AI Weekly", "max_age_days": 7, "categories": ["AI News"], "max_items_per_category": 5, "search_agent_behavior": "", "ai_provider": "openai", "ai_model": "gpt-4o-mini", "ai_model_writing": "gpt-4o-mini" } - POST
/api/v1/user/api-keys— store the real OpenAI key (provider:"openai"). - POST
/api/v1/sources— add a source (e.g.,https://openai.com/blog). - POST
/api/v1/syntheses/generate— trigger generation, getjob_id. - Consume SSE stream via
page.evaluate()usingEventSourcein the browser context (where the session cookie is available). Wait forcompleteevent, JSON-parse thedatafield to extractsynthesis_id. Timeout: 120s. - GET
/api/v1/syntheses/:synthesis_id— fetch the full synthesis. - Validate structure and content (see below).
CSRF: All PUT/POST/DELETE calls must include X-Requested-With: XMLHttpRequest header. Use page.evaluate() with fetch() in the browser context (which shares the session cookie), or use Playwright's request fixture with explicit cookie and header management.
Uses gpt-4o-mini to keep cost under $0.01 per run.
Validation Assertions
- Synthesis has status
"completed" - At least 1 section exists
- Each section has a
titlefield (the category name) matching configured categories - Each section has
itemsarray with at least 1 entry - Each item has:
title: non-empty stringurl: starts with"http"summary: string with length > 50 characters
No assertion on content quality — only structural integrity and non-trivial output.
Gating
.env.testis gitignored — never committed.env.test.exampleis committed as a template:OPENAI_TEST_API_KEY=sk-your-key-here- The test uses
test.skip()if the env var is not set - Existing
npx playwright test(without.env.test) continues to work unchanged — this test simply skips
What does NOT change
- Existing E2E tests and their docker-compose
- Playwright config (the new spec runs alongside existing specs)
- Backend code — no changes
- Frontend code — no changes