docs: add spec for LLM call logging per synthesis
parent
f7428191ec
commit
314fb7a037
@ -0,0 +1,141 @@
|
||||
# Design: LLM Call Logging — Track All LLM Interactions Per Synthesis
|
||||
|
||||
**Date**: 2026-03-24
|
||||
**Scope**: Log every LLM call during synthesis generation with full prompt/response, viewable per synthesis
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
When synthesis quality is poor, there's no way to see what prompts were sent to the LLM or what it returned. Users need visibility into every LLM call to debug prompt effectiveness, model behavior, and pipeline issues.
|
||||
|
||||
## Approach
|
||||
|
||||
New `llm_call_log` table stores every LLM call with full prompt, response, timing, and model info. Linked to syntheses via `job_id`. A dedicated log viewer page is accessible from the synthesis list.
|
||||
|
||||
## New Table: `llm_call_log`
|
||||
|
||||
```sql
|
||||
CREATE TABLE llm_call_log (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
|
||||
job_id UUID NOT NULL,
|
||||
call_type TEXT NOT NULL,
|
||||
model TEXT NOT NULL,
|
||||
system_prompt TEXT NOT NULL DEFAULT '',
|
||||
user_prompt TEXT NOT NULL DEFAULT '',
|
||||
response_body TEXT NOT NULL DEFAULT '',
|
||||
duration_ms INTEGER NOT NULL DEFAULT 0,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
);
|
||||
CREATE INDEX idx_llm_call_log_job_id ON llm_call_log(job_id);
|
||||
CREATE INDEX idx_llm_call_log_user_id ON llm_call_log(user_id, created_at);
|
||||
```
|
||||
|
||||
**`call_type` values:** `search`, `classification_phase1`, `classification_phase2`, `rewrite`, `link_extraction`, `article_extraction`, `key_test`
|
||||
|
||||
## Pipeline Integration
|
||||
|
||||
A helper function `log_llm_call` inserts a row after each LLM call:
|
||||
|
||||
```rust
|
||||
async fn log_llm_call(
|
||||
pool: &PgPool,
|
||||
user_id: Uuid,
|
||||
job_id: Uuid,
|
||||
call_type: &str,
|
||||
model: &str,
|
||||
system_prompt: &str,
|
||||
user_prompt: &str,
|
||||
response: &serde_json::Value,
|
||||
duration_ms: u64,
|
||||
)
|
||||
```
|
||||
|
||||
Timing measured with `std::time::Instant::now()` before each provider call, `elapsed().as_millis()` after.
|
||||
|
||||
**Instrumentation points (7 LLM call sites):**
|
||||
|
||||
1. **Search pass** — `call_type: "search"` (Phase 2 web search)
|
||||
2. **Classification Phase 1** — `call_type: "classification_phase1"`
|
||||
3. **Classification Phase 2** — `call_type: "classification_phase2"`
|
||||
4. **Rewrite pass** — `call_type: "rewrite"`
|
||||
5. **Link extraction** (per source, when LLM enabled) — `call_type: "link_extraction"`
|
||||
6. **Article extraction** (per article, when LLM enabled) — `call_type: "article_extraction"`
|
||||
7. **Key test** — `call_type: "key_test"` (API key test endpoint, optional)
|
||||
|
||||
## Cleanup
|
||||
|
||||
During the existing generation startup cleanup (alongside `article_history::cleanup_old`), truncate old LLM log entries. Entries older than `article_history_days`:
|
||||
- Replace `system_prompt`, `user_prompt`, `response_body` with first 500 chars + `\n[truncated]`
|
||||
- Keep metadata (call_type, model, duration_ms, timestamps) intact
|
||||
|
||||
This avoids unbounded storage growth while preserving summary info for old runs.
|
||||
|
||||
## API Endpoint
|
||||
|
||||
**`GET /api/v1/llm-logs/:job_id`**
|
||||
|
||||
Returns all log entries for a generation job, ordered by `created_at`. Authenticated, scoped to user (verify the job_id belongs to a synthesis owned by the user).
|
||||
|
||||
Response:
|
||||
```json
|
||||
[
|
||||
{
|
||||
"id": "uuid",
|
||||
"call_type": "search",
|
||||
"model": "gpt-4o-mini",
|
||||
"system_prompt": "Tu es un assistant...",
|
||||
"user_prompt": "Aujourd'hui nous sommes...",
|
||||
"response_body": "{\"category_0\": [...]}",
|
||||
"duration_ms": 12500,
|
||||
"created_at": "2026-03-24T..."
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
## Frontend
|
||||
|
||||
### LLM Logs page (`/llm-logs/:job_id`)
|
||||
|
||||
- Shows all LLM calls for a generation run in chronological order
|
||||
- Each call displayed as a card:
|
||||
- Header: call_type badge (colored), model name, duration (e.g., "12.5s")
|
||||
- Three expandable sections: System Prompt, User Prompt, Response
|
||||
- Text areas are scrollable, monospace font
|
||||
- Response pretty-printed as JSON when parseable
|
||||
|
||||
### Home page — log button
|
||||
|
||||
On each synthesis row in the list, add a small icon button (next to the delete button) that navigates to `/llm-logs/:job_id`. The `job_id` comes from the synthesis data. Button hidden for old syntheses without `job_id`.
|
||||
|
||||
## Files to Modify
|
||||
|
||||
**Backend:**
|
||||
- **Create:** migration `20260324000017_create_llm_call_log.sql`
|
||||
- **Create:** `backend/src/db/llm_call_log.rs` — insert, list_by_job_id, truncate_old
|
||||
- **Modify:** `backend/src/db/mod.rs` — register module
|
||||
- **Create:** `backend/src/handlers/llm_logs.rs` — handler
|
||||
- **Modify:** `backend/src/handlers/mod.rs` — register
|
||||
- **Modify:** `backend/src/router.rs` — add route
|
||||
- **Modify:** `backend/src/services/synthesis.rs` — add `log_llm_call` helper, wrap each LLM call with timing
|
||||
- **Modify:** `CLAUDE.md` — migration count to 17
|
||||
|
||||
**Frontend:**
|
||||
- **Create:** `frontend/src/pages/LlmLogs.tsx` — log viewer page
|
||||
- **Create:** `frontend/src/api/llmLogs.ts` — API client
|
||||
- **Modify:** `frontend/src/App.tsx` — add route
|
||||
- **Modify:** `frontend/src/pages/Home.tsx` — add log button on each synthesis row
|
||||
- **Modify:** `frontend/src/i18n/fr.ts` — labels
|
||||
- **Modify:** `frontend/src/types.ts` — `LlmCallLogEntry` type
|
||||
|
||||
**Tests:**
|
||||
- **Modify:** `e2e/tests/generation-live.spec.ts` — verify LLM logs endpoint returns data
|
||||
|
||||
## What Does NOT Change
|
||||
|
||||
- LLM provider trait/implementations — logging happens at the call site, not inside providers
|
||||
- Pipeline logic — no changes to filtering, classification, or rewrite behavior
|
||||
- Article history — independent feature, both use job_id
|
||||
- Existing synthesis display — unchanged (only Home page gets the log button)
|
||||
- Settings — no new settings (reuses `article_history_days` for retention)
|
||||
Loading…
Reference in New Issue