diff --git a/docs/deployment.md b/docs/deployment.md new file mode 100644 index 0000000..3e33b05 --- /dev/null +++ b/docs/deployment.md @@ -0,0 +1,257 @@ +# Deployment Guide + +## Docker Deployment + +AI Weekly Synth is designed for Docker-only deployment. The `docker-compose.yml` at the project root orchestrates the application and its PostgreSQL database. + +### Quick Start + +```bash +# 1. Clone the repository +git clone +cd ai_synth + +# 2. Create and configure .env +cp .env.example .env +# Edit .env and fill in all values (see Environment Variables below) + +# 3. Start the stack +docker compose up -d + +# 4. Create the first admin user +docker exec ai-synth ./ai-synth-backend create-admin admin@example.com +``` + +The application will be available at `http://localhost:8080` (or the port configured in `PORT`). + +### Docker Compose Services + +The `docker-compose.yml` defines two services: + +**app** (AI Weekly Synth backend + frontend): +- Multi-stage Docker image: Node.js builds the frontend, Rust builds the backend, then both are combined into a minimal Debian runtime +- Runs as a non-root user (`appuser`) +- Depends on `db` with a health check condition (waits for Postgres to be ready) +- Health check: `curl -f http://localhost:8080/api/v1/health` every 30 seconds +- Restart policy: `unless-stopped` + +**db** (PostgreSQL 17 Alpine): +- Data persisted to a named Docker volume (`postgres_data`) +- Exposed on `127.0.0.1:5432` (localhost only, not accessible from external networks) +- Health check: `pg_isready` every 10 seconds +- Shared memory: 128 MB +- Restart policy: `unless-stopped` + +### Dockerfile Details + +The `backend/Dockerfile` uses a three-stage build: + +1. **frontend-builder** (Node.js 22 Alpine): Runs `npm ci` and `npm run build` to produce the static frontend in `/app/dist/` +2. **builder** (Rust 1.88 Bookworm): Compiles the Rust backend in release mode with `SQLX_OFFLINE=true` (no live database needed during build) +3. **runtime** (Debian Bookworm Slim): Installs only `ca-certificates`, `libssl3`, and `curl`. Copies the binary, migrations, and frontend static files. Runs as non-root. + +--- + +## Environment Variables + +All environment variables are documented in `.env.example`. The `.env` file is loaded by Docker Compose. + +### Required + +| Variable | Description | Example | +|----------|-------------|---------| +| `DATABASE_URL` | PostgreSQL connection string. In docker-compose, the hostname is `db`. | `postgres://ai_synth:secret@db:5432/ai_synth` | +| `POSTGRES_PASSWORD` | Password for the PostgreSQL user. Used by both the `db` service and in `DATABASE_URL`. | `a-strong-random-password` | +| `MASTER_ENCRYPTION_KEY` | 256-bit key for AES-256-GCM encryption of user API keys at rest. Must be exactly 64 hex characters. Generate with `openssl rand -hex 32`. **Back this up securely -- losing it means all stored API keys become unreadable.** | `ab12cd34...` (64 hex chars) | +| `APP_URL` | Public URL where the app is accessible (no trailing slash). Used for magic link URLs, CORS origin, and cookie domain. | `https://synth.example.com` | +| `RESEND_API_KEY` | API key for Resend (email service). Required for magic link emails and synthesis email export. Sign up at https://resend.com. | `re_xxxxx` | +| `EMAIL_FROM` | Sender address for emails. Must be a verified domain in Resend. | `AI Weekly Synth ` | +| `TURNSTILE_SECRET_KEY` | Server-side secret key for Cloudflare Turnstile captcha. Sign up at https://dash.cloudflare.com/turnstile. | `0x4AAAAAAA...` | +| `TURNSTILE_SITE_KEY` | Client-side site key for Cloudflare Turnstile. | `0x4BBBBBB...` | + +### Optional + +| Variable | Description | Default | +|----------|-------------|---------| +| `PORT` | Port for the backend HTTP server (inside the container). The docker-compose maps this to the host. | `8080` | +| `RUST_LOG` | Logging level. Format: `level` or `level,crate=level`. | `info,ai_synth_backend=debug` | +| `STATIC_DIR` | Path to the built frontend files. In Docker, this is `./static` (set by docker-compose). For local dev, use `../frontend/dist`. | `./static` (Docker) | +| `SESSION_SECRET` | Secret for session cookie signing. At least 64 characters. If not set, a random value is generated at startup (sessions will not survive restarts). | Random | + +--- + +## Database + +### PostgreSQL + +The application uses PostgreSQL 17. The `docker-compose.yml` runs it as the `db` service with a named volume for data persistence. + +Key configuration: +- User: `ai_synth` (configurable via `POSTGRES_PASSWORD`) +- Database: `ai_synth` +- Shared memory: 128 MB (for complex queries) +- Health check via `pg_isready` + +### Automatic Migrations + +Database migrations run automatically every time the application starts. The backend calls `sqlx::migrate!("./migrations")` in `main.rs` before starting the HTTP server. There are currently 30 migration files covering all schema changes from initial setup through themes, schedules, article history, and LLM call logging. + +No manual migration step is needed. The application will not start serving requests until migrations complete successfully. + +### Tables + +The database contains the following tables: + +| Table | Purpose | +|-------|---------| +| `users` | User accounts (email, display name, role) | +| `sessions` | Active sessions (hashed tokens, expiry) | +| `magic_link_tokens` | Passwordless login tokens | +| `user_settings` | Per-user configuration (provider, model, batch size, etc.) | +| `sources` | User-defined news sources (URLs, titles, themes) | +| `syntheses` | Generated synthesis results (sections as JSONB) | +| `admin_providers` | Admin-curated LLM providers and models | +| `admin_rate_limits` | Admin-configured rate limits per provider | +| `user_api_keys` | Encrypted LLM API keys | +| `audit_log` | Admin action audit trail | +| `article_history` | Previously seen article URLs for dedup | +| `llm_call_log` | LLM API call logs (prompts, responses, timing) | +| `themes` | User-defined synthesis themes (topic, categories, settings) | +| `theme_schedules` | Automated generation schedules per theme | + +--- + +## Background Tasks + +The application starts two background tasks automatically on startup. No external cron or scheduler is needed. + +### Session Cleanup (hourly) + +Every hour, a background task deletes expired sessions from the `sessions` table. This prevents unbounded growth of the sessions table. The task logs the number of deleted sessions. + +### Scheduled Synthesis Generation (every 60 seconds) + +Every 60 seconds, a background task checks for due theme schedules (matching the current day of the week and time in UTC). For each due schedule, it: + +1. Runs the synthesis generation pipeline for the associated theme +2. Sends the result via email to the configured recipients (up to 3) +3. Marks the schedule as run (updates `last_run_at`) to prevent re-execution on the same day + +This is a single-instance scheduler -- it does not use distributed locks. Do not run multiple instances of the application if scheduled generation is enabled (it would cause duplicate executions). + +--- + +## Monitoring + +### Health Check + +The `/api/v1/health` endpoint returns HTTP 200 when the application is running and can serve requests. It is used by: + +- Docker's built-in health check (configured in both `docker-compose.yml` and `Dockerfile`) +- External monitoring tools + +```bash +curl -f http://localhost:8080/api/v1/health +``` + +### Logs + +The application uses structured logging via the `tracing` crate. Log level is controlled by the `RUST_LOG` environment variable. + +Recommended production setting: + +``` +RUST_LOG=info,ai_synth_backend=debug +``` + +This provides: +- `info` level for all crates (HTTP requests, startup/shutdown, background tasks) +- `debug` level for the application code (detailed pipeline progress, LLM call timing) + +Logs go to stdout, which Docker captures and makes available via `docker logs ai-synth`. + +To view logs: + +```bash +docker logs ai-synth # all logs +docker logs ai-synth --tail 100 # last 100 lines +docker logs ai-synth -f # follow live +``` + +--- + +## Backup + +### Database + +The PostgreSQL data volume (`postgres_data`) is the only stateful component. Back it up regularly: + +```bash +# Dump the database +docker exec ai-synth-db pg_dump -U ai_synth ai_synth > backup_$(date +%Y%m%d).sql + +# Restore from a dump +cat backup_20260327.sql | docker exec -i ai-synth-db psql -U ai_synth ai_synth +``` + +### No File Storage + +The application does not store files on disk. All data (syntheses, settings, API keys, article history) lives in PostgreSQL. The frontend is served from static files baked into the Docker image. + +### Encryption Key + +The `MASTER_ENCRYPTION_KEY` is critical. If lost, all user API keys stored in the database become permanently unreadable. Store it securely (e.g., in a secrets manager) and include it in your disaster recovery plan. + +--- + +## Updating + +To update to a new version: + +```bash +# 1. Pull the latest code +git pull + +# 2. Rebuild the Docker image and restart +docker compose up -d --build +``` + +This will: +1. Rebuild the Docker image (frontend build + Rust compilation) +2. Restart the `app` container with the new image +3. Automatically run any new migrations on startup +4. The `db` container is unaffected (data persists in the named volume) + +The restart causes a brief downtime (typically 10-30 seconds for the health check to pass). For zero-downtime deployments, consider running behind a reverse proxy with health-check-based routing. + +--- + +## Security Checklist + +Before deploying to production, verify: + +- [ ] **`MASTER_ENCRYPTION_KEY`** is set to a random 64 hex character value (not the example value). Generated with `openssl rand -hex 32`. Stored securely and backed up. +- [ ] **`POSTGRES_PASSWORD`** is set to a strong random password. +- [ ] **HTTPS** is configured. Set `APP_URL` to an `https://` URL. The application sets `Secure` on cookies when `APP_URL` starts with `https`. Use a reverse proxy (nginx, Caddy, Traefik) to terminate TLS. +- [ ] **Turnstile** keys are configured. Without them, the registration and login forms will not work (captcha is required). +- [ ] **Resend** API key is configured with a verified sending domain. +- [ ] **`SKIP_SSRF_CHECK`** is NOT set. This env var disables SSRF protection and should only be used in test environments. +- [ ] **Postgres** is not exposed to the internet. The docker-compose binds it to `127.0.0.1:5432` by default. +- [ ] **Docker socket** is not exposed. The app does not need Docker access. +- [ ] **Firewall** allows inbound traffic only on the app port (8080 or whichever port is mapped). +- [ ] **Reverse proxy** is configured to forward `X-Forwarded-For` and `X-Forwarded-Proto` headers if the app is behind a proxy. + +### Security Features (Built-in) + +The application includes the following security measures that require no additional configuration: + +- **AES-256-GCM encryption** for user LLM API keys at rest (per-key random nonces) +- **SSRF prevention** in the web scraper (DNS resolution checks, private IP blocking, redirect validation) +- **CSRF protection** via `X-Requested-With` header on all mutating API endpoints +- **Session cookies**: `HttpOnly`, `SameSite=Lax`, `Secure` (when HTTPS) +- **Security headers**: CSP, X-Frame-Options (DENY), X-Content-Type-Options (nosniff), Referrer-Policy, HSTS (when HTTPS) +- **Anti-enumeration**: Same response for existent/non-existent emails in auth flows +- **Error sanitization**: Internal errors and API key patterns are stripped from client-facing error messages +- **Rate limiting**: Configurable per-provider rate limits for LLM API calls +- **Non-root container**: The Docker image runs as `appuser` +- **Graceful shutdown**: SIGTERM/Ctrl+C triggers clean shutdown with database pool closure diff --git a/docs/dev_guidelines.md b/docs/dev_guidelines.md new file mode 100644 index 0000000..7ff0934 --- /dev/null +++ b/docs/dev_guidelines.md @@ -0,0 +1,340 @@ +# Development Guidelines + +## Getting Started + +### Prerequisites + +- **Rust** (stable, 1.88+) with `cargo` +- **Node.js** (22+) with `npm` +- **PostgreSQL** (17+) -- or Docker to run it +- **Docker** and `docker compose` for containerized development + +### Local Development Setup + +1. **Start a Postgres instance.** The easiest way is via the test compose file: + + ```bash + cd e2e && docker compose -f docker-compose.test.yml up -d db + ``` + + This starts Postgres on port 5433 with user `ai_synth_test` / password `testpassword`. + +2. **Run the backend:** + + ```bash + cd backend + export DATABASE_URL=postgres://ai_synth_test:testpassword@127.0.0.1:5433/ai_synth_test + cargo run -- serve + ``` + + Migrations run automatically on startup. + +3. **Run the frontend (dev server with hot reload):** + + ```bash + cd frontend + npm install + npm run dev + ``` + + The Vite dev server proxies `/api` requests to the backend on port 8080. + +4. **Create an admin user:** + + ```bash + cd backend && cargo run -- create-admin admin@example.com + ``` + +### Environment Variables + +Copy `.env.example` to `.env` and fill in the values. The critical ones for local dev are `DATABASE_URL`, `MASTER_ENCRYPTION_KEY` (64 hex chars -- generate with `openssl rand -hex 32`), and `APP_URL`. + +--- + +## Project Structure + +``` +ai_synth/ + backend/ Rust/Axum backend + src/ + main.rs Entry point, CLI (serve, create-admin) + router.rs All API routes + middleware + app_state.rs Shared application state (Arc-wrapped) + errors.rs Unified AppError enum + config.rs Environment config parsing + handlers/ HTTP handlers (thin: validate, delegate, respond) + services/ Business logic (auth, synthesis pipeline, LLM providers, scraper, etc.) + db/ Database queries (sqlx, parameterized) + models/ Data types + validation + middleware/ Auth session extraction, CSRF check + util/ Token generation, hashing + migrations/ SQL migrations (30 files, auto-run on startup) + tests/ Integration tests (require Postgres) + frontend/ SolidJS + Tailwind CSS v4 + src/ + App.tsx Router, layouts, route guards + pages/ Page-level components + components/ Reusable components (Button, LoadingSpinner, settings/*) + api/ API clients (one file per resource) + contexts/ AuthContext (session-based) + i18n/ French translations + utils/ SSE client, date formatting, provider info + types.ts All TypeScript domain types + e2e/ Playwright E2E tests + tests/ Test specs + helpers/ Auth helpers, DB access + seed.ts Test data seeder + scripts/ Test runner scripts + docs/ Architecture reports, plans, specs +``` + +### Layer Architecture (Backend) + +``` +handlers/ (HTTP layer) --> services/ (business logic) --> db/ (data access) + | | + models/ (shared types) <---------+ + | + errors.rs +``` + +- **Handlers** are thin: validate input, call services/db, format responses. +- **Services** contain business logic. The `LlmProvider` trait and synthesis pipeline live here. +- **DB** modules contain pure SQL queries returning typed results. No business logic. +- **Models** define data types and validation. Shared across layers. + +--- + +## Coding Standards + +### Rust + +#### Error Handling + +All errors flow through the unified `AppError` enum (in `backend/src/errors.rs`): + +```rust +#[derive(Debug, thiserror::Error)] +pub enum AppError { + NotFound(String), // 404 + Unauthorized(String), // 401 + Forbidden(String), // 403 + BadRequest(String), // 400 + Validation(String), // 422 + Internal(anyhow::Error),// 500 -- details logged, not exposed to client + RateLimited(String), // 429 +} +``` + +Key rules: +- **Never use `unwrap()` in production code.** Use `?`, `ok_or_else`, `map_err`, or `unwrap_or_default` with appropriate logging. `unwrap()` is only acceptable in `#[cfg(test)]` blocks and `LazyLock` static initializers. +- **`AppError::Internal` hides details** from the client. The full error is logged via `tracing::error!` but the response body only contains `"An internal error occurred"`. +- **`From` and `From`** conversions are implemented, so you can use `?` with both types. +- **Validation errors** should use `AppError::Validation(message)` (returns 422). + +#### Arc Usage + +`Arc` is used to share data across `tokio::spawn` boundaries. Common patterns: +- `Arc` for the LLM provider (shared across classify tasks) +- `Arc` for cancellation flags +- `Arc>` for SSE progress channels +- `Arc`, `Arc>`, `Arc` for data shared with spawned tasks + +#### Auth Middleware Pattern + +Authentication uses Axum extractors (in `backend/src/middleware/auth.rs`): + +- **`AuthUser`**: Reads the session cookie, looks up the session in the DB, checks expiration, loads the user. Any handler that takes `AuthUser` as a parameter automatically rejects unauthenticated requests with 401. +- **`AdminUser(AuthUser)`**: Wraps `AuthUser` and additionally checks `UserRole::Admin`. Returns 403 if not admin. + +To require authentication, simply add the extractor to your handler signature: + +```rust +async fn my_handler(auth: AuthUser, State(state): State) -> Result<..., AppError> { ... } +``` + +For admin-only endpoints: + +```rust +async fn admin_handler(admin: AdminUser, State(state): State) -> Result<..., AppError> { ... } +``` + +#### Other Rust Conventions + +- All SQL queries use parameterized bindings (`$1`, `$2`) via sqlx. Never interpolate strings into SQL. +- Prefer `tracing::info!`, `tracing::warn!`, `tracing::error!` over `println!`. +- Code comments and log messages are in English. User-facing strings are in French (via the i18n system). +- Module-level `//!` doc comments on every file; function-level `///` doc comments on public items. + +### Frontend (SolidJS) + +#### Reactive Primitives + +- Use `createSignal` for local component state. +- Use `createResource` for async data that should auto-refetch (preferred over `createEffect` + manual fetch). +- Use `createMemo` for derived/computed values. +- Use `createEffect` for side effects that need to react to signal changes. +- Always use `onCleanup` to clear timers, close connections, and cancel subscriptions. + +#### Component Patterns + +- Use the `Button` component (`components/ui/Button.tsx`) with `variant`/`loading`/`icon` props instead of raw `