You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

3.7 KiB

Raw Blame History

Design: Synthesis Generation UAT with Real OpenAI API

Date: 2026-03-23 Scope: Automated Playwright E2E test that exercises the full generation pipeline with a real OpenAI API key

Context

The generation pipeline has never been tested end-to-end with a real LLM API call. Unit and integration tests use fake keys or mock the LLM layer. Runtime bugs (wrong table name, missing schema fields) were only discovered during manual Docker testing. A UAT with a real API key validates the entire pipeline: settings, key encryption, model resolution, LLM call, response parsing, and synthesis storage.

Approach

Add a single Playwright spec (e2e/tests/generation-live.spec.ts) that calls the backend API directly with a real OpenAI key. The test is gated behind a .env.test file — it skips when no key is available, so it doesn't break CI or other developers.

Files

Create: e2e/tests/generation-live.spec.ts — the test
Create: e2e/.env.test.example — template showing required vars
Modify: e2e/.gitignore — add .env.test

No changes to existing tests, docker-compose, or Playwright config.

Test Flow

Load OPENAI_TEST_API_KEY from e2e/.env.test via dotenv. Skip if missing.
Override test timeout to 180s (test.setTimeout(180_000)) — real LLM calls take 30-120s.
Login as seeded user via loginAsUser() helper (sets session cookie on the page context).
Ensure OpenAI provider is enabled — page.request.fetch PUT to admin providers API, or use the seeded admin user.

PUT /api/v1/settings with all required fields:

{
  "theme": "AI Weekly",
  "max_age_days": 7,
  "categories": ["AI News"],
  "max_items_per_category": 5,
  "search_agent_behavior": "",
  "ai_provider": "openai",
  "ai_model": "gpt-4o-mini",
  "ai_model_writing": "gpt-4o-mini"
}

POST /api/v1/user/api-keys — store the real OpenAI key (provider: "openai").
POST /api/v1/sources — add a source (e.g., https://openai.com/blog).
POST /api/v1/syntheses/generate — trigger generation, get job_id.
Consume SSE stream via page.evaluate() using EventSource in the browser context (where the session cookie is available). Wait for complete event, JSON-parse the data field to extract synthesis_id. Timeout: 120s.
GET /api/v1/syntheses/:synthesis_id — fetch the full synthesis.
Validate structure and content (see below).

CSRF: All PUT/POST/DELETE calls must include X-Requested-With: XMLHttpRequest header. Use page.evaluate() with fetch() in the browser context (which shares the session cookie), or use Playwright's request fixture with explicit cookie and header management.

Uses gpt-4o-mini to keep cost under $0.01 per run.

Validation Assertions

Synthesis has status "completed"
At least 1 section exists
Each section has a title field (the category name) matching configured categories
Each section has items array with at least 1 entry
Each item has:
- title: non-empty string
- url: starts with "http"
- summary: string with length > 50 characters

No assertion on content quality — only structural integrity and non-trivial output.

Gating

.env.test is gitignored — never committed
.env.test.example is committed as a template: OPENAI_TEST_API_KEY=sk-your-key-here
The test uses test.skip() if the env var is not set
Existing npx playwright test (without .env.test) continues to work unchanged — this test simply skips

What does NOT change

Existing E2E tests and their docker-compose
Playwright config (the new spec runs alongside existing specs)
Backend code — no changes
Frontend code — no changes

3.7 KiB Raw Blame History