You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
ai_synth/docs/superpowers/specs/2026-03-24-llm-call-logging...

142 lines
5.5 KiB
Markdown

# Design: LLM Call Logging — Track All LLM Interactions Per Synthesis
**Date**: 2026-03-24
**Scope**: Log every LLM call during synthesis generation with full prompt/response, viewable per synthesis
---
## Context
When synthesis quality is poor, there's no way to see what prompts were sent to the LLM or what it returned. Users need visibility into every LLM call to debug prompt effectiveness, model behavior, and pipeline issues.
## Approach
New `llm_call_log` table stores every LLM call with full prompt, response, timing, and model info. Linked to syntheses via `job_id`. A dedicated log viewer page is accessible from the synthesis list.
## New Table: `llm_call_log`
```sql
CREATE TABLE llm_call_log (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
job_id UUID NOT NULL,
call_type TEXT NOT NULL,
model TEXT NOT NULL,
system_prompt TEXT NOT NULL DEFAULT '',
user_prompt TEXT NOT NULL DEFAULT '',
response_body TEXT NOT NULL DEFAULT '',
duration_ms INTEGER NOT NULL DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_llm_call_log_job_id ON llm_call_log(job_id);
CREATE INDEX idx_llm_call_log_user_id ON llm_call_log(user_id, created_at);
```
**`call_type` values:** `search`, `classification_phase1`, `classification_phase2`, `rewrite`, `link_extraction`, `article_extraction`, `key_test`
## Pipeline Integration
A helper function `log_llm_call` inserts a row after each LLM call:
```rust
async fn log_llm_call(
pool: &PgPool,
user_id: Uuid,
job_id: Uuid,
call_type: &str,
model: &str,
system_prompt: &str,
user_prompt: &str,
response: &serde_json::Value,
duration_ms: u64,
)
```
Timing measured with `std::time::Instant::now()` before each provider call, `elapsed().as_millis()` after.
**Instrumentation points (7 LLM call sites):**
1. **Search pass**`call_type: "search"` (Phase 2 web search)
2. **Classification Phase 1**`call_type: "classification_phase1"`
3. **Classification Phase 2**`call_type: "classification_phase2"`
4. **Rewrite pass**`call_type: "rewrite"`
5. **Link extraction** (per source, when LLM enabled) — `call_type: "link_extraction"`
6. **Article extraction** (per article, when LLM enabled) — `call_type: "article_extraction"`
7. **Key test**`call_type: "key_test"` (API key test endpoint, optional)
## Cleanup
During the existing generation startup cleanup (alongside `article_history::cleanup_old`), truncate old LLM log entries. Entries older than `article_history_days`:
- Replace `system_prompt`, `user_prompt`, `response_body` with first 500 chars + `\n[truncated]`
- Keep metadata (call_type, model, duration_ms, timestamps) intact
This avoids unbounded storage growth while preserving summary info for old runs.
## API Endpoint
**`GET /api/v1/llm-logs/:job_id`**
Returns all log entries for a generation job, ordered by `created_at`. Authenticated, scoped to user (verify the job_id belongs to a synthesis owned by the user).
Response:
```json
[
{
"id": "uuid",
"call_type": "search",
"model": "gpt-4o-mini",
"system_prompt": "Tu es un assistant...",
"user_prompt": "Aujourd'hui nous sommes...",
"response_body": "{\"category_0\": [...]}",
"duration_ms": 12500,
"created_at": "2026-03-24T..."
}
]
```
## Frontend
### LLM Logs page (`/llm-logs/:job_id`)
- Shows all LLM calls for a generation run in chronological order
- Each call displayed as a card:
- Header: call_type badge (colored), model name, duration (e.g., "12.5s")
- Three expandable sections: System Prompt, User Prompt, Response
- Text areas are scrollable, monospace font
- Response pretty-printed as JSON when parseable
### Home page — log button
On each synthesis row in the list, add a small icon button (next to the delete button) that navigates to `/llm-logs/:job_id`. The `job_id` comes from the synthesis data. Button hidden for old syntheses without `job_id`.
## Files to Modify
**Backend:**
- **Create:** migration `20260324000017_create_llm_call_log.sql`
- **Create:** `backend/src/db/llm_call_log.rs` — insert, list_by_job_id, truncate_old
- **Modify:** `backend/src/db/mod.rs` — register module
- **Create:** `backend/src/handlers/llm_logs.rs` — handler
- **Modify:** `backend/src/handlers/mod.rs` — register
- **Modify:** `backend/src/router.rs` — add route
- **Modify:** `backend/src/services/synthesis.rs` — add `log_llm_call` helper, wrap each LLM call with timing
- **Modify:** `CLAUDE.md` — migration count to 17
**Frontend:**
- **Create:** `frontend/src/pages/LlmLogs.tsx` — log viewer page
- **Create:** `frontend/src/api/llmLogs.ts` — API client
- **Modify:** `frontend/src/App.tsx` — add route
- **Modify:** `frontend/src/pages/Home.tsx` — add log button on each synthesis row
- **Modify:** `frontend/src/i18n/fr.ts` — labels
- **Modify:** `frontend/src/types.ts``LlmCallLogEntry` type
**Tests:**
- **Modify:** `e2e/tests/generation-live.spec.ts` — verify LLM logs endpoint returns data
## What Does NOT Change
- LLM provider trait/implementations — logging happens at the call site, not inside providers
- Pipeline logic — no changes to filtering, classification, or rewrite behavior
- Article history — independent feature, both use job_id
- Existing synthesis display — unchanged (only Home page gets the log button)
- Settings — no new settings (reuses `article_history_days` for retention)