diff --git a/CLAUDE.md b/CLAUDE.md index 7f80257..a46cee5 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,63 +1,84 @@ # AI Weekly Synth ## Overview -AI Weekly Synth is a self-hosted web application that generates AI-powered weekly news syntheses. Users configure their topics, categories, and preferred LLM provider, then the app searches the web, validates sources, and produces structured summaries. +AI Weekly Synth is a self-hosted web application that generates AI-powered weekly news syntheses. Users create themes (topics), configure categories and sources, then the app scrapes sources, classifies articles via LLM, and produces structured summaries. Supports scheduled generation with email delivery. ## Architecture - **Backend**: Rust (Axum) — `backend/` - **Frontend**: SolidJS + Tailwind CSS v4 — `frontend/` - **Database**: PostgreSQL (via sqlx with runtime-checked queries) -- **Deployment**: Docker only (`docker-compose.yml`) +- **Deployment**: Docker (`docker-compose.yml`, `restart: unless-stopped`) ## Project Structure ``` ai_synth/ ├── backend/ Rust/Axum backend │ ├── src/ -│ │ ├── main.rs Entry point, CLI (serve, create-admin) +│ │ ├── main.rs Entry point, CLI, background tasks (session cleanup, scheduler) │ │ ├── router.rs All API routes + middleware stack -│ │ ├── handlers/ HTTP handlers (auth, settings, sources, syntheses, admin, etc.) -│ │ ├── services/ Business logic (auth, email, encryption, scraper, LLM providers, synthesis pipeline) +│ │ ├── handlers/ HTTP handlers (auth, settings, sources, themes, schedules, syntheses, admin, etc.) +│ │ ├── services/ Business logic (synthesis pipeline, job_store, scheduler, LLM providers, scraper, email, encryption) │ │ ├── db/ Database queries (sqlx) -│ │ ├── models/ Data types + validation +│ │ ├── models/ Data types + validation (settings, theme, schedule, source, synthesis, etc.) │ │ ├── middleware/ Auth session extraction, CSRF check │ │ └── util/ Token generation, hashing -│ ├── migrations/ SQL migrations (9 files) -│ ├── tests/ Integration tests (require Postgres) +│ ├── migrations/ SQL migrations (30 files) +│ ├── tests/ Integration tests (17 files, require Postgres) │ ├── Cargo.toml │ └── Dockerfile Multi-stage build ├── frontend/ SolidJS frontend │ ├── src/ │ │ ├── App.tsx Router, layouts, route guards -│ │ ├── pages/ Login, Register, Home, Settings, Sources, GenerateSynthesis, SynthesisDetail, admin/* -│ │ ├── components/ Navbar, Layout, AdminLayout, Turnstile, ApiKeyManager, ui/* -│ │ ├── api/ API clients (auth, settings, sources, syntheses, admin, config, apiKeys) +│ │ ├── pages/ Home, Settings, ThemeManager, GenerateSynthesis, SynthesisDetail, ArticleHistory, admin/* +│ │ ├── components/ Navbar, ApiKeyManager, settings/* (BraveSearch, Schedule, RateLimit), ui/* +│ │ ├── api/ API clients (auth, settings, sources, themes, schedules, syntheses, admin, config, apiKeys) │ │ ├── contexts/ AuthContext (session-based) -│ │ ├── i18n/ French translations (i18n-ready for future languages) -│ │ └── utils/ SSE client, date formatting, provider info +│ │ ├── i18n/ French translations +│ │ └── utils/ SSE client, date formatting, URL utils, provider info │ ├── package.json │ └── vite.config.ts SolidJS + Tailwind + dev proxy -├── docs/ Analysis reports + implementation plans +├── e2e/ E2E tests (Playwright) +│ ├── tests/ 7 test specs +│ ├── helpers/ Auth helpers, seed.ts +│ └── docker-compose.test.yml +├── scripts/ Test runner scripts +│ ├── run-integration-tests.sh +│ └── run-e2e-tests.sh +├── docs/ Consolidated documentation (see below) ├── docker-compose.yml App + Postgres ├── .env.example All required env vars documented └── CLAUDE.md This file ``` +## Documentation +- [`docs/requirements.md`](docs/requirements.md) — Product vision, features, user roles, non-functional requirements +- [`docs/functional_specs.md`](docs/functional_specs.md) — User journeys, feature details, pipeline description +- [`docs/architecture.md`](docs/architecture.md) — System design, layers, data model, security, concurrency +- [`docs/technical_specs.md`](docs/technical_specs.md) — Tech stack, DB schema, API specs, pipeline flow +- [`docs/dev_guidelines.md`](docs/dev_guidelines.md) — Coding standards, patterns, how-tos, pitfalls +- [`docs/qa_guidelines.md`](docs/qa_guidelines.md) — Test inventory, infrastructure, writing tests +- [`docs/deployment.md`](docs/deployment.md) — Docker setup, env vars, monitoring, security + ## Key Features -- **Authentication**: Email + magic link (passwordless), Cloudflare Turnstile captcha, 30-day session cookies +- **Multi-Theme**: Users create multiple themes, each with its own categories, sources, and schedule - **LLM Providers**: Google Gemini, OpenAI, Anthropic — users bring their own API keys -- **Generation Pipeline**: 2-pass (search with web grounding → scrape/validate URLs → rewrite summaries), adaptive per provider +- **Generation Pipeline**: Two-phase (personalized sources → web search fallback), windowed extraction, batched scrape+classify +- **Brave Search**: Optional alternative to LLM web search for Phase 2 +- **Scheduled Generation**: Per-theme day/time schedule with email delivery to up to 3 addresses +- **Preferred Sources**: Mark sources as priority — processed first during generation +- **Stop Generation**: Cancel in-progress generation, saves partial results - **Admin Module**: Provider/model curation, rate limit config, user management -- **Security**: AES-256-GCM encryption for API keys at rest, SSRF prevention in scraper, CSRF via X-Requested-With, HttpOnly/SameSite cookies -- **Export**: Email via Resend, PDF, Markdown -- **Real-time**: SSE for generation progress streaming +- **Security**: AES-256-GCM encryption, SSRF prevention, CSRF, HttpOnly cookies +- **Export**: Email (Resend), PDF, Markdown +- **Real-time**: SSE progress streaming with cancellation support +- **Article Intelligence**: LLM-extracted dates, is_article detection, configurable summary length ## Running Locally -### Docker (production-like) +### Docker (production) ```bash cp .env.example .env # Fill in values -docker compose up +docker compose up -d ``` ### Development @@ -71,7 +92,6 @@ cd frontend && npm install && npm run dev ### CLI ```bash -# Create first admin user cd backend && cargo run -- create-admin admin@example.com ``` @@ -80,8 +100,11 @@ cd backend && cargo run -- create-admin admin@example.com # Backend unit tests (no Postgres needed) cd backend && cargo test --lib -# Backend integration tests (requires Postgres) -TEST_DATABASE_URL=postgres://user:pass@localhost:5432/postgres cargo test +# Integration tests (uses docker-compose.test.yml Postgres) +./scripts/run-integration-tests.sh + +# E2E tests (builds Docker, seeds DB, runs Playwright) +./scripts/run-e2e-tests.sh # Frontend unit tests cd frontend && npx vitest run @@ -90,41 +113,13 @@ cd frontend && npx vitest run cd frontend && npx tsc --noEmit ``` -## API Endpoints - -### Public -- `POST /api/v1/auth/register` — create account + magic link -- `POST /api/v1/auth/login` — request magic link -- `GET /api/v1/auth/verify` — verify token (email click) -- `POST /api/v1/auth/verify` — verify token (frontend API) -- `GET /api/v1/health` — health check - -### Authenticated -- `GET/PUT /api/v1/settings` — user settings -- `GET/POST/DELETE /api/v1/sources` — sources CRUD + bulk/CSV import/export -- `GET/DELETE /api/v1/syntheses/:id` — syntheses CRUD -- `POST /api/v1/syntheses/generate` — trigger async generation -- `GET /api/v1/syntheses/generate/:job_id/progress` — SSE progress stream -- `POST /api/v1/syntheses/:id/send-email` — email synthesis -- `GET /api/v1/syntheses/:id/export/markdown` — Markdown download -- `GET /api/v1/syntheses/:id/export/pdf` — PDF download -- `GET/POST/DELETE /api/v1/user/api-keys` — LLM API key management -- `GET /api/v1/config/providers` — available providers/models - -### Admin Only -- `GET/POST/PUT/DELETE /api/v1/admin/providers` — provider/model config -- `GET/PUT /api/v1/admin/rate-limits` — rate limit config -- `GET /api/v1/admin/users` — user list -- `PUT /api/v1/admin/users/:id/role` — role management - ## Database (30 migrations) -Tables: `users`, `sessions`, `magic_link_tokens`, `user_settings`, `sources`, `syntheses`, `admin_providers`, `admin_rate_limits`, `user_api_keys`, `audit_log` +Tables: `users`, `sessions`, `magic_link_tokens`, `settings`, `themes`, `theme_schedules`, `sources`, `syntheses`, `article_history`, `llm_call_log`, `admin_providers`, `admin_rate_limits`, `user_api_keys`, `audit_log` ## Environment Variables See `.env.example` for the complete list. Key ones: - `DATABASE_URL` — Postgres connection string - `MASTER_ENCRYPTION_KEY` — 64 hex chars for AES-256-GCM -- `SESSION_SECRET` — at least 64 chars - `RESEND_API_KEY` — for email sending - `TURNSTILE_SECRET_KEY` / `TURNSTILE_SITE_KEY` — captcha - `APP_URL` — public URL (for CORS, magic links, cookies) @@ -134,5 +129,5 @@ See `.env.example` for the complete list. Key ones: - Users bring their own LLM API keys (encrypted at rest) - Admin curates available providers/models, users select from the list - Single-tenant self-hosted (one instance per deployment) -- i18n-ready (French only for now, all strings in `frontend/src/i18n/fr.ts`) -- Adaptive generation pipeline: skips scrape+rewrite when native web grounding is sufficient +- i18n-ready (French only for now) +- Per-theme content settings, global infrastructure settings