You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

342 lines
14 KiB
Markdown

# Development Guidelines
## Getting Started
### Prerequisites
- **Rust** (stable, 1.88+) with `cargo`
- **Node.js** (22+) with `npm`
- **PostgreSQL** (17+) -- or Docker to run it
- **Docker** and `docker compose` for containerized development
### Local Development Setup
1. **Start a Postgres instance.** The easiest way is via the test compose file:
```bash
cd e2e && docker compose -f docker-compose.test.yml up -d db
```
This starts Postgres on port 5433 with user `ai_synth_test` / password `testpassword`.
2. **Run the backend:**
```bash
cd backend
export DATABASE_URL=postgres://ai_synth_test:testpassword@127.0.0.1:5433/ai_synth_test
cargo run -- serve
```
Migrations run automatically on startup.
3. **Run the frontend (dev server with hot reload):**
```bash
cd frontend
npm install
npm run dev
```
The Vite dev server proxies `/api` requests to the backend on port 8080.
4. **Create an admin user:**
```bash
cd backend && cargo run -- create-admin admin@example.com
```
### Environment Variables
Copy `.env.example` to `.env` and fill in the values. The critical ones for local dev are `DATABASE_URL`, `MASTER_ENCRYPTION_KEY` (64 hex chars -- generate with `openssl rand -hex 32`), and `APP_URL`.
---
## Project Structure
```
ai_synth/
backend/ Rust/Axum backend
src/
main.rs Entry point, CLI (serve, create-admin)
router.rs All API routes + middleware
app_state.rs Shared application state (Arc-wrapped)
errors.rs Unified AppError enum
config.rs Environment config parsing
handlers/ HTTP handlers (thin: validate, delegate, respond)
services/ Business logic (auth, synthesis pipeline, LLM providers, scraper, etc.)
db/ Database queries (sqlx, parameterized)
models/ Data types + validation
middleware/ Auth session extraction, CSRF check
util/ Token generation, hashing
migrations/ SQL migrations (30 files, auto-run on startup)
tests/ Integration tests (require Postgres)
frontend/ SolidJS + Tailwind CSS v4
src/
App.tsx Router, layouts, route guards
pages/ Page-level components
components/ Reusable components (Button, LoadingSpinner, settings/*)
api/ API clients (one file per resource)
contexts/ AuthContext (session-based)
i18n/ French translations
utils/ SSE client, date formatting, provider info
types.ts All TypeScript domain types
e2e/ Playwright E2E tests
tests/ Test specs
helpers/ Auth helpers, DB access
seed.ts Test data seeder
scripts/ Test runner scripts
docs/ Architecture reports, plans, specs
```
### Layer Architecture (Backend)
```
handlers/ (HTTP layer) --> services/ (business logic) --> db/ (data access)
| |
models/ (shared types) <---------+
|
errors.rs
```
- **Handlers** are thin: validate input, call services/db, format responses.
- **Services** contain business logic. The `LlmProvider` trait and synthesis pipeline live here.
- **DB** modules contain pure SQL queries returning typed results. No business logic.
- **Models** define data types and validation. Shared across layers.
---
## Coding Standards
### Rust
#### Error Handling
All errors flow through the unified `AppError` enum (in `backend/src/errors.rs`):
```rust
#[derive(Debug, thiserror::Error)]
pub enum AppError {
NotFound(String), // 404
Unauthorized(String), // 401
Forbidden(String), // 403
BadRequest(String), // 400
Validation(String), // 422
Internal(anyhow::Error),// 500 -- details logged, not exposed to client
RateLimited(String), // 429
}
```
Key rules:
- **Never use `unwrap()` in production code.** Use `?`, `ok_or_else`, `map_err`, or `unwrap_or_default` with appropriate logging. `unwrap()` is only acceptable in `#[cfg(test)]` blocks and `LazyLock` static initializers.
- **`AppError::Internal` hides details** from the client. The full error is logged via `tracing::error!` but the response body only contains `"An internal error occurred"`.
- **`From<sqlx::Error>` and `From<anyhow::Error>`** conversions are implemented, so you can use `?` with both types.
- **Validation errors** should use `AppError::Validation(message)` (returns 422).
#### Arc Usage
`Arc` is used to share data across `tokio::spawn` boundaries. Common patterns:
- `Arc<dyn LlmProvider>` for the LLM provider (shared across classify tasks)
- `Arc<AtomicBool>` for cancellation flags
- `Arc<watch::Sender<ProgressEvent>>` for SSE progress channels
- `Arc<String>`, `Arc<Vec<String>>`, `Arc<Value>` for data shared with spawned tasks
#### Auth Middleware Pattern
Authentication uses Axum extractors (in `backend/src/middleware/auth.rs`):
- **`AuthUser`**: Reads the session cookie, looks up the session in the DB, checks expiration, loads the user. Any handler that takes `AuthUser` as a parameter automatically rejects unauthenticated requests with 401.
- **`AdminUser(AuthUser)`**: Wraps `AuthUser` and additionally checks `UserRole::Admin`. Returns 403 if not admin.
To require authentication, simply add the extractor to your handler signature:
```rust
async fn my_handler(auth: AuthUser, State(state): State<AppState>) -> Result<..., AppError> { ... }
```
For admin-only endpoints:
```rust
async fn admin_handler(admin: AdminUser, State(state): State<AppState>) -> Result<..., AppError> { ... }
```
#### Other Rust Conventions
- All SQL queries use parameterized bindings (`$1`, `$2`) via sqlx. Never interpolate strings into SQL.
- Prefer `tracing::info!`, `tracing::warn!`, `tracing::error!` over `println!`.
- Code comments and log messages are in English. User-facing strings are in French (via the i18n system).
- Module-level `//!` doc comments on every file; function-level `///` doc comments on public items.
### Frontend (SolidJS)
#### Reactive Primitives
- Use `createSignal` for local component state.
- Use `createResource` for async data that should auto-refetch (preferred over `createEffect` + manual fetch).
- Use `createMemo` for derived/computed values.
- Use `createEffect` for side effects that need to react to signal changes.
- Always use `onCleanup` to clear timers, close connections, and cancel subscriptions.
#### Component Patterns
- Use the `Button` component (`components/ui/Button.tsx`) with `variant`/`loading`/`icon` props instead of raw `<button>` elements with inline Tailwind classes.
- This rule is strict for all frontend UI code (no raw `<button>` in application components).
- Use `<Switch>/<Match>` for mutually exclusive conditional rendering instead of multiple adjacent `<Show>` blocks.
- Use `<For each={...}>` for list rendering.
- Use the `useToast` context for user feedback (success/error notifications).
#### i18n
All user-facing strings go through the translation system in `frontend/src/i18n/fr.ts`. Use the `t()` function:
```tsx
import { t } from '~/i18n/fr';
// ...
<p>{t('settings.saved')}</p>
```
Never hardcode French strings directly in JSX.
#### TypeScript
- `tsconfig.json` has `strict: true`. No escape hatches.
- Domain types live in `frontend/src/types.ts`. Import them from there.
- API clients use generics for type safety (`get<T>`, `post<T>`, etc.).
- Use the `isApiError` type guard from `types.ts` in catch blocks.
#### Import Conventions
All imports use the `~/` alias (configured in Vite). No relative path imports across directories.
---
## Common Patterns
### Adding a New Setting
Follow this sequence when adding a new user-configurable setting:
1. **Migration**: Create a new SQL migration in `backend/migrations/` that adds the column with a default value:
```sql
ALTER TABLE user_settings ADD COLUMN my_new_setting INTEGER NOT NULL DEFAULT 5;
```
Naming: `YYYYMMDD00000N_add_my_new_setting.sql`
2. **Model** (`backend/src/models/settings.rs`): Add the field to both `UserSettings` and `UpdateSettingsRequest`. Add validation in `UpdateSettingsRequest::validate()`.
3. **DB** (`backend/src/db/settings.rs`): Update the `get_or_create_default` and `update` queries to include the new column.
4. **Frontend types** (`frontend/src/types.ts`): Add the field to `UserSettings` and `UpdateSettingsPayload`. Also update `DEFAULT_SETTINGS` in `Settings.tsx`.
5. **i18n** (`frontend/src/i18n/fr.ts`): Add translation keys for the label, description, and any validation messages.
6. **Settings UI** (`frontend/src/pages/Settings.tsx`): Add the form control. Use the appropriate input type (number, checkbox, select, etc.).
7. **Important**: The `PUT /settings` endpoint requires the **complete** settings payload (not a partial update). The frontend must always send all fields. If you add a field, update the `DEFAULT_SETTINGS` object to include it with a sensible default.
### Adding a New API Endpoint
1. **Handler** (`backend/src/handlers/`): Create the handler function. Use `AuthUser` or `AdminUser` extractors as needed. Return `Result<impl IntoResponse, AppError>`.
2. **Router** (`backend/src/router.rs`): Register the route. Place it in the correct section (public, authenticated, admin). Watch for path parameter conflicts -- more specific routes must be registered before generic `{id}` routes.
3. **Integration tests** (`backend/tests/`): Write tests covering:
- Happy path (200/201/204)
- Auth required (401 without session)
- Validation errors (422 for bad input)
- Not found (404 for missing resources)
- Ownership isolation (user A cannot access user B's resources)
- Admin-only access (403 for non-admin if applicable)
4. **Frontend**: Add the API client function in the appropriate `frontend/src/api/` file. Add TypeScript types if needed.
### Adding a New LLM Provider
The `LlmProvider` trait (in `backend/src/services/llm/mod.rs`) defines the contract:
```rust
#[async_trait]
pub trait LlmProvider: Send + Sync {
fn provider_id(&self) -> &str;
async fn call_llm(&self, model: &str, system_prompt: &str, user_prompt: &str, response_schema: &Value) -> Result<Value, AppError>;
}
```
Steps:
1. **Implement the trait**: Create `backend/src/services/llm/my_provider.rs`. Implement `LlmProvider`. Use `map_provider_http_error()` from `llm/mod.rs` for HTTP status mapping.
2. **Register in the module**: Add `pub mod my_provider;` to `backend/src/services/llm/mod.rs`.
3. **Add to the factory** (`backend/src/services/llm/factory.rs`): Add a match arm in `create_provider()`:
```rust
"my_provider" => Ok(Arc::new(MyProvider::new(api_key, http_client))),
```
4. **Add factory tests**: Test that `create_provider("my_provider", ...)` returns the correct provider.
5. **Admin setup**: The admin must add the provider via the admin UI (`/admin/providers`) with its available models before users can select it.
No changes to the pipeline are needed -- it uses the `LlmProvider` trait polymorphically.
---
## Git Workflow
### Commit Messages
Follow the conventional commits format used in this project:
```
type: short description
Longer explanation if needed.
```
Types: `feat`, `fix`, `docs`, `refactor`, `test`, `chore`.
Examples from the repo:
- `fix: rewrite pass schema uses actual scraped item counts, not max setting`
- `fix: filter empty scraped articles + restore URLs after rewrite + E2E assertions`
- `docs: add spec and plan for source priority pipeline redesign`
### Rules
- Never force push to `master`.
- Create feature branches for non-trivial changes.
- Keep commits focused -- one logical change per commit.
---
## Common Pitfalls
### Drop Deadlock in Tests
The `TestApp` struct in `backend/tests/common/mod.rs` uses a `Drop` implementation that spawns a background thread to clean up the test database. **Do not call `.join()` on this thread** -- it causes a deadlock because the spawned thread's `block_on` conflicts with the existing tokio runtime's connection pool.
The `Drop` implementation fires and forgets the cleanup thread intentionally. For explicit cleanup, call `app.cleanup().await` at the end of the test instead.
### SSRF Bypass Environment Variable
The `SKIP_SSRF_CHECK=1` environment variable disables all SSRF protection in the scraper. It exists for integration tests (which use wiremock on localhost). **Never set this in production.** The `scripts/run-integration-tests.sh` script sets it automatically.
### Settings Payload Completeness
The `PUT /settings` endpoint requires the **complete** settings object, not a partial update. If you send a payload missing a field, the request will fail with a deserialization error. When writing integration tests, always include every field in the settings JSON. When adding a new setting field, update all existing test payloads.
### Pipeline Test Requirements
Pipeline integration tests require:
- A running Postgres instance (via `TEST_DATABASE_URL`)
- `SKIP_SSRF_CHECK=1` (to allow wiremock on localhost)
- Wiremock for mocking HTTP responses from source websites
- `MockLlmProvider` for deterministic LLM responses
The mock provider identifies call types by inspecting the system prompt content (e.g., looking for French keywords like "classer"). If you change prompt wording, the mock may need updating.
### Gemini API Key in URL
The Gemini provider places the API key in the URL query string (`?key=...`). The error handler avoids logging the full URL, but intermediary proxies or debug-level logging could expose it. Be aware of this when configuring logging levels.