You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
346 lines
14 KiB
Markdown
346 lines
14 KiB
Markdown
# Development Guidelines
|
|
|
|
## Getting Started
|
|
|
|
### Prerequisites
|
|
|
|
- **Rust** (stable, 1.88+) with `cargo`
|
|
- **Node.js** (22+) with `npm`
|
|
- **PostgreSQL** (17+) -- or Docker to run it
|
|
- **Docker** and `docker compose` for containerized development
|
|
|
|
### Local Development Setup
|
|
|
|
1. **Start a Postgres instance.** The easiest way is via the test compose file:
|
|
|
|
```bash
|
|
cd e2e && docker compose -f docker-compose.test.yml up -d db
|
|
```
|
|
|
|
This starts Postgres on port 5433 with user `ai_synth_test` / password `testpassword`.
|
|
|
|
2. **Run the backend:**
|
|
|
|
```bash
|
|
cd backend
|
|
export DATABASE_URL=postgres://ai_synth_test:testpassword@127.0.0.1:5433/ai_synth_test
|
|
cargo run -- serve
|
|
```
|
|
|
|
Migrations run automatically on startup.
|
|
|
|
3. **Run the frontend (dev server with hot reload):**
|
|
|
|
```bash
|
|
cd frontend
|
|
npm install
|
|
npm run dev
|
|
```
|
|
|
|
The Vite dev server proxies `/api` requests to the backend on port 8080.
|
|
|
|
4. **Create an admin user:**
|
|
|
|
```bash
|
|
cd backend && cargo run -- create-admin admin@example.com
|
|
```
|
|
|
|
### Environment Variables
|
|
|
|
Copy `.env.example` to `.env` and fill in the values. The critical ones for local dev are `DATABASE_URL`, `MASTER_ENCRYPTION_KEY` (64 hex chars -- generate with `openssl rand -hex 32`), and `APP_URL`.
|
|
|
|
---
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
ai_synth/
|
|
backend/ Rust/Axum backend
|
|
src/
|
|
main.rs Entry point, CLI (serve, create-admin)
|
|
router.rs All API routes + middleware
|
|
app_state.rs Shared application state (Arc-wrapped)
|
|
errors.rs Unified AppError enum
|
|
config.rs Environment config parsing
|
|
handlers/ HTTP handlers (thin: validate, delegate, respond)
|
|
services/ Business logic (auth, synthesis pipeline, LLM providers, scraper, etc.)
|
|
db/ Database queries (sqlx, parameterized)
|
|
models/ Data types + validation
|
|
middleware/ Auth session extraction, CSRF check
|
|
util/ Token generation, hashing
|
|
migrations/ SQL migrations (30 files, auto-run on startup)
|
|
tests/ Integration tests (require Postgres)
|
|
frontend/ SolidJS + Tailwind CSS v4
|
|
src/
|
|
App.tsx Router, layouts, route guards
|
|
pages/ Page-level components
|
|
components/ Reusable components (Button, LoadingSpinner, settings/*)
|
|
api/ API clients (one file per resource)
|
|
contexts/ AuthContext (session-based)
|
|
i18n/ French translations
|
|
utils/ SSE client, date formatting, provider info
|
|
types.ts All TypeScript domain types
|
|
e2e/ Playwright E2E tests
|
|
tests/ Test specs
|
|
helpers/ Auth helpers, DB access
|
|
seed.ts Test data seeder
|
|
scripts/ Test runner scripts
|
|
docs/ Architecture reports, plans, specs
|
|
```
|
|
|
|
### Layer Architecture (Backend)
|
|
|
|
```
|
|
handlers/ (HTTP layer) --> services/ (business logic) --> db/ (data access)
|
|
| |
|
|
models/ (shared types) <---------+
|
|
|
|
|
errors.rs
|
|
```
|
|
|
|
- **Handlers** are thin: validate input, call services/db, format responses.
|
|
- **Services** contain business logic. The `LlmProvider` trait and synthesis pipeline live here.
|
|
- **DB** modules contain pure SQL queries returning typed results. No business logic.
|
|
- **Models** define data types and validation. Shared across layers.
|
|
|
|
---
|
|
|
|
## Coding Standards
|
|
|
|
### Rust
|
|
|
|
#### Error Handling
|
|
|
|
All errors flow through the unified `AppError` enum (in `backend/src/errors.rs`):
|
|
|
|
```rust
|
|
#[derive(Debug, thiserror::Error)]
|
|
pub enum AppError {
|
|
NotFound(String), // 404
|
|
Unauthorized(String), // 401
|
|
Forbidden(String), // 403
|
|
BadRequest(String), // 400
|
|
Validation(String), // 422
|
|
Internal(anyhow::Error),// 500 -- details logged, not exposed to client
|
|
RateLimited(String), // 429
|
|
}
|
|
```
|
|
|
|
Key rules:
|
|
|
|
- **Never use `unwrap()` in production code.** Use `?`, `ok_or_else`, `map_err`, or `unwrap_or_default` with appropriate logging. `unwrap()` is only acceptable in `#[cfg(test)]` blocks and `LazyLock` static initializers.
|
|
- **`AppError::Internal` hides details** from the client. The full error is logged via `tracing::error!` but the response body only contains `"An internal error occurred"`.
|
|
- **`From<sqlx::Error>` and `From<anyhow::Error>`** conversions are implemented, so you can use `?` with both types.
|
|
- **Validation errors** should use `AppError::Validation(message)` (returns 422).
|
|
|
|
#### Arc Usage
|
|
|
|
`Arc` is used to share data across `tokio::spawn` boundaries. Common patterns:
|
|
|
|
- `Arc<dyn LlmProvider>` for the LLM provider (shared across classify tasks)
|
|
- `Arc<AtomicBool>` for cancellation flags
|
|
- `Arc<watch::Sender<ProgressEvent>>` for SSE progress channels
|
|
- `Arc<String>`, `Arc<Vec<String>>`, `Arc<Value>` for data shared with spawned tasks
|
|
|
|
#### Auth Middleware Pattern
|
|
|
|
Authentication uses Axum extractors (in `backend/src/middleware/auth.rs`):
|
|
|
|
- **`AuthUser`**: Reads the session cookie, looks up the session in the DB, checks expiration, loads the user. Any handler that takes `AuthUser` as a parameter automatically rejects unauthenticated requests with 401.
|
|
- **`AdminUser(AuthUser)`**: Wraps `AuthUser` and additionally checks `UserRole::Admin`. Returns 403 if not admin.
|
|
|
|
To require authentication, simply add the extractor to your handler signature:
|
|
|
|
```rust
|
|
async fn my_handler(auth: AuthUser, State(state): State<AppState>) -> Result<..., AppError> { ... }
|
|
```
|
|
|
|
For admin-only endpoints:
|
|
|
|
```rust
|
|
async fn admin_handler(admin: AdminUser, State(state): State<AppState>) -> Result<..., AppError> { ... }
|
|
```
|
|
|
|
#### Other Rust Conventions
|
|
|
|
- All SQL queries use parameterized bindings (`$1`, `$2`) via sqlx. Never interpolate strings into SQL.
|
|
- Prefer `tracing::info!`, `tracing::warn!`, `tracing::error!` over `println!`.
|
|
- Code comments and log messages are in English. User-facing strings are in French (via the i18n system).
|
|
- Module-level `//!` doc comments on every file; function-level `///` doc comments on public items.
|
|
|
|
### Frontend (SolidJS)
|
|
|
|
#### Reactive Primitives
|
|
|
|
- Use `createSignal` for local component state.
|
|
- Use `createResource` for async data that should auto-refetch (preferred over `createEffect` + manual fetch).
|
|
- Use `createMemo` for derived/computed values.
|
|
- Use `createEffect` for side effects that need to react to signal changes.
|
|
- Always use `onCleanup` to clear timers, close connections, and cancel subscriptions.
|
|
|
|
#### Component Patterns
|
|
|
|
- Use the `Button` component (`components/ui/Button.tsx`) with `variant`/`loading`/`icon` props instead of raw `<button>` elements with inline Tailwind classes.
|
|
- This rule is strict for all frontend UI code (no raw `<button>` in application components).
|
|
- Use `<Switch>/<Match>` for mutually exclusive conditional rendering instead of multiple adjacent `<Show>` blocks.
|
|
- Use `<For each={...}>` for list rendering.
|
|
- Use the `useToast` context for user feedback (success/error notifications).
|
|
|
|
#### i18n
|
|
|
|
All user-facing strings go through the translation system in `frontend/src/i18n/fr.ts`. Use the `t()` function:
|
|
|
|
```tsx
|
|
import { t } from '~/i18n/fr';
|
|
// ...
|
|
<p>{t('settings.saved')}</p>
|
|
```
|
|
|
|
Never hardcode French strings directly in JSX.
|
|
|
|
#### TypeScript
|
|
|
|
- `tsconfig.json` has `strict: true`. No escape hatches.
|
|
- Domain types live in `frontend/src/types.ts`. Import them from there.
|
|
- API clients use generics for type safety (`get<T>`, `post<T>`, etc.).
|
|
- Use the `isApiError` type guard from `types.ts` in catch blocks.
|
|
|
|
#### Import Conventions
|
|
|
|
All imports use the `~/` alias (configured in Vite). No relative path imports across directories.
|
|
|
|
---
|
|
|
|
## Common Patterns
|
|
|
|
### Adding a New Setting
|
|
|
|
Follow this sequence when adding a new user-configurable setting:
|
|
|
|
1. **Migration**: Create a new SQL migration in `backend/migrations/` that adds the column with a default value:
|
|
|
|
```sql
|
|
ALTER TABLE user_settings ADD COLUMN my_new_setting INTEGER NOT NULL DEFAULT 5;
|
|
```
|
|
|
|
Naming: `YYYYMMDD00000N_add_my_new_setting.sql`
|
|
|
|
2. **Model** (`backend/src/models/settings.rs`): Add the field to both `UserSettings` and `UpdateSettingsRequest`. Add validation in `UpdateSettingsRequest::validate()`.
|
|
|
|
3. **DB** (`backend/src/db/settings.rs`): Update the `get_or_create_default` and `update` queries to include the new column.
|
|
|
|
4. **Frontend types** (`frontend/src/types.ts`): Add the field to `UserSettings` and `UpdateSettingsPayload`. Also update `DEFAULT_SETTINGS` in `Settings.tsx`.
|
|
|
|
5. **i18n** (`frontend/src/i18n/fr.ts`): Add translation keys for the label, description, and any validation messages.
|
|
|
|
6. **Settings UI** (`frontend/src/pages/Settings.tsx`): Add the form control. Use the appropriate input type (number, checkbox, select, etc.).
|
|
|
|
7. **Important**: The `PUT /settings` endpoint requires the **complete** settings payload (not a partial update). The frontend must always send all fields. If you add a field, update the `DEFAULT_SETTINGS` object to include it with a sensible default.
|
|
|
|
### Adding a New API Endpoint
|
|
|
|
1. **Handler** (`backend/src/handlers/`): Create the handler function. Use `AuthUser` or `AdminUser` extractors as needed. Return `Result<impl IntoResponse, AppError>`.
|
|
|
|
2. **Router** (`backend/src/router.rs`): Register the route. Place it in the correct section (public, authenticated, admin). Watch for path parameter conflicts -- more specific routes must be registered before generic `{id}` routes.
|
|
|
|
3. **Integration tests** (`backend/tests/`): Write tests covering:
|
|
- Happy path (200/201/204)
|
|
- Auth required (401 without session)
|
|
- Validation errors (422 for bad input)
|
|
- Not found (404 for missing resources)
|
|
- Ownership isolation (user A cannot access user B's resources)
|
|
- Admin-only access (403 for non-admin if applicable)
|
|
|
|
4. **Frontend**: Add the API client function in the appropriate `frontend/src/api/` file. Add TypeScript types if needed.
|
|
|
|
### Adding a New LLM Provider
|
|
|
|
The `LlmProvider` trait (in `backend/src/services/llm/mod.rs`) defines the contract:
|
|
|
|
```rust
|
|
#[async_trait]
|
|
pub trait LlmProvider: Send + Sync {
|
|
fn provider_id(&self) -> &str;
|
|
async fn call_llm(&self, model: &str, system_prompt: &str, user_prompt: &str, response_schema: &Value) -> Result<Value, AppError>;
|
|
}
|
|
```
|
|
|
|
Steps:
|
|
|
|
1. **Implement the trait**: Create `backend/src/services/llm/my_provider.rs`. Implement `LlmProvider`. Use `map_provider_http_error()` from `llm/mod.rs` for HTTP status mapping.
|
|
|
|
2. **Register in the module**: Add `pub mod my_provider;` to `backend/src/services/llm/mod.rs`.
|
|
|
|
3. **Add to the factory** (`backend/src/services/llm/factory.rs`): Add a match arm in `create_provider()`:
|
|
|
|
```rust
|
|
"my_provider" => Ok(Arc::new(MyProvider::new(api_key, http_client))),
|
|
```
|
|
|
|
4. **Add factory tests**: Test that `create_provider("my_provider", ...)` returns the correct provider.
|
|
|
|
5. **Admin setup**: The admin must add the provider via the admin UI (`/admin/providers`) with its available models before users can select it.
|
|
|
|
No changes to the pipeline are needed -- it uses the `LlmProvider` trait polymorphically.
|
|
|
|
---
|
|
|
|
## Git Workflow
|
|
|
|
### Commit Messages
|
|
|
|
Follow the conventional commits format used in this project:
|
|
|
|
```
|
|
type: short description
|
|
|
|
Longer explanation if needed.
|
|
```
|
|
|
|
Types: `feat`, `fix`, `docs`, `refactor`, `test`, `chore`.
|
|
|
|
Examples from the repo:
|
|
|
|
- `fix: rewrite pass schema uses actual scraped item counts, not max setting`
|
|
- `fix: filter empty scraped articles + restore URLs after rewrite + E2E assertions`
|
|
- `docs: add spec and plan for source priority pipeline redesign`
|
|
|
|
### Rules
|
|
|
|
- Never force push to `master`.
|
|
- Create feature branches for non-trivial changes.
|
|
- Keep commits focused -- one logical change per commit.
|
|
|
|
---
|
|
|
|
## Common Pitfalls
|
|
|
|
### Drop Deadlock in Tests
|
|
|
|
The `TestApp` struct in `backend/tests/common/mod.rs` uses a `Drop` implementation that spawns a background thread to clean up the test database. **Do not call `.join()` on this thread** -- it causes a deadlock because the spawned thread's `block_on` conflicts with the existing tokio runtime's connection pool.
|
|
|
|
The `Drop` implementation fires and forgets the cleanup thread intentionally. For explicit cleanup, call `app.cleanup().await` at the end of the test instead.
|
|
|
|
### SSRF Bypass Environment Variable
|
|
|
|
The `SKIP_SSRF_CHECK=1` environment variable disables all SSRF protection in the scraper. It exists for integration tests (which use wiremock on localhost). **Never set this in production.** The `scripts/run-integration-tests.sh` script sets it automatically.
|
|
|
|
### Settings Payload Completeness
|
|
|
|
The `PUT /settings` endpoint requires the **complete** settings object, not a partial update. If you send a payload missing a field, the request will fail with a deserialization error. When writing integration tests, always include every field in the settings JSON. When adding a new setting field, update all existing test payloads.
|
|
|
|
### Pipeline Test Requirements
|
|
|
|
Pipeline integration tests require:
|
|
|
|
- A running Postgres instance (via `TEST_DATABASE_URL`)
|
|
- `SKIP_SSRF_CHECK=1` (to allow wiremock on localhost)
|
|
- Wiremock for mocking HTTP responses from source websites
|
|
- `MockLlmProvider` for deterministic LLM responses
|
|
|
|
The mock provider identifies call types by inspecting the system prompt content (e.g., looking for French keywords like "classer"). If you change prompt wording, the mock may need updating.
|
|
|
|
### Gemini API Key in URL
|
|
|
|
The Gemini provider places the API key in the URL query string (`?key=...`). The error handler avoids logging the full URL, but intermediary proxies or debug-level logging could expose it. Be aware of this when configuring logging levels.
|