You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

14 KiB

Raw Permalink Blame History

Development Guidelines

Getting Started

Prerequisites

Rust (stable, 1.88+) with cargo
Node.js (22+) with npm
PostgreSQL (17+) -- or Docker to run it
Docker and docker compose for containerized development

Local Development Setup

Start a Postgres instance. The easiest way is via the test compose file:
```
cd e2e && docker compose -f docker-compose.test.yml up -d db
```
This starts Postgres on port 5433 with user ai_synth_test / password testpassword.

Run the backend:

cd backend
export DATABASE_URL=postgres://ai_synth_test:testpassword@127.0.0.1:5433/ai_synth_test
cargo run -- serve

Migrations run automatically on startup.

Run the frontend (dev server with hot reload):
```
cd frontend
npm install
npm run dev
```
The Vite dev server proxies /api requests to the backend on port 8080.

Create an admin user:

cd backend && cargo run -- create-admin admin@example.com

Environment Variables

Copy .env.example to .env and fill in the values. The critical ones for local dev are DATABASE_URL, MASTER_ENCRYPTION_KEY (64 hex chars -- generate with openssl rand -hex 32), and APP_URL.

Project Structure

ai_synth/
  backend/                   Rust/Axum backend
    src/
      main.rs                Entry point, CLI (serve, create-admin)
      router.rs              All API routes + middleware
      app_state.rs           Shared application state (Arc-wrapped)
      errors.rs              Unified AppError enum
      config.rs              Environment config parsing
      handlers/              HTTP handlers (thin: validate, delegate, respond)
      services/              Business logic (auth, synthesis pipeline, LLM providers, scraper, etc.)
      db/                    Database queries (sqlx, parameterized)
      models/                Data types + validation
      middleware/            Auth session extraction, CSRF check
      util/                  Token generation, hashing
    migrations/              SQL migrations (30 files, auto-run on startup)
    tests/                   Integration tests (require Postgres)
  frontend/                  SolidJS + Tailwind CSS v4
    src/
      App.tsx                Router, layouts, route guards
      pages/                 Page-level components
      components/            Reusable components (Button, LoadingSpinner, settings/*)
      api/                   API clients (one file per resource)
      contexts/              AuthContext (session-based)
      i18n/                  French translations
      utils/                 SSE client, date formatting, provider info
      types.ts               All TypeScript domain types
  e2e/                       Playwright E2E tests
    tests/                   Test specs
    helpers/                 Auth helpers, DB access
    seed.ts                  Test data seeder
  scripts/                   Test runner scripts
  docs/                      Architecture reports, plans, specs

Layer Architecture (Backend)

handlers/ (HTTP layer)  -->  services/ (business logic)  -->  db/ (data access)
                                |                               |
                              models/ (shared types)  <---------+
                                |
                              errors.rs

Handlers are thin: validate input, call services/db, format responses.
Services contain business logic. The LlmProvider trait and synthesis pipeline live here.
DB modules contain pure SQL queries returning typed results. No business logic.
Models define data types and validation. Shared across layers.

Coding Standards

Rust

Error Handling

All errors flow through the unified AppError enum (in backend/src/errors.rs):

#[derive(Debug, thiserror::Error)]
pub enum AppError {
    NotFound(String),       // 404
    Unauthorized(String),   // 401
    Forbidden(String),      // 403
    BadRequest(String),     // 400
    Validation(String),     // 422
    Internal(anyhow::Error),// 500 -- details logged, not exposed to client
    RateLimited(String),    // 429
}

Key rules:

Never use unwrap() in production code. Use ?, ok_or_else, map_err, or unwrap_or_default with appropriate logging. unwrap() is only acceptable in #[cfg(test)] blocks and LazyLock static initializers.
AppError::Internal hides details from the client. The full error is logged via tracing::error! but the response body only contains "An internal error occurred".
From<sqlx::Error> and From<anyhow::Error> conversions are implemented, so you can use ? with both types.
Validation errors should use AppError::Validation(message) (returns 422).

Arc Usage

Arc is used to share data across tokio::spawn boundaries. Common patterns:

Arc<dyn LlmProvider> for the LLM provider (shared across classify tasks)
Arc<AtomicBool> for cancellation flags
Arc<watch::Sender<ProgressEvent>> for SSE progress channels
Arc<String>, Arc<Vec<String>>, Arc<Value> for data shared with spawned tasks

Auth Middleware Pattern

Authentication uses Axum extractors (in backend/src/middleware/auth.rs):

AuthUser: Reads the session cookie, looks up the session in the DB, checks expiration, loads the user. Any handler that takes AuthUser as a parameter automatically rejects unauthenticated requests with 401.
AdminUser(AuthUser): Wraps AuthUser and additionally checks UserRole::Admin. Returns 403 if not admin.

To require authentication, simply add the extractor to your handler signature:

async fn my_handler(auth: AuthUser, State(state): State<AppState>) -> Result<..., AppError> { ... }

For admin-only endpoints:

async fn admin_handler(admin: AdminUser, State(state): State<AppState>) -> Result<..., AppError> { ... }

Other Rust Conventions

All SQL queries use parameterized bindings ($1, $2) via sqlx. Never interpolate strings into SQL.
Prefer tracing::info!, tracing::warn!, tracing::error! over println!.
Code comments and log messages are in English. User-facing strings are in French (via the i18n system).
Module-level //! doc comments on every file; function-level /// doc comments on public items.

Frontend (SolidJS)

Reactive Primitives

Use createSignal for local component state.
Use createResource for async data that should auto-refetch (preferred over createEffect + manual fetch).
Use createMemo for derived/computed values.
Use createEffect for side effects that need to react to signal changes.
Always use onCleanup to clear timers, close connections, and cancel subscriptions.

Component Patterns

Use the Button component (components/ui/Button.tsx) with variant/loading/icon props instead of raw <button> elements with inline Tailwind classes.
This rule is strict for all frontend UI code (no raw <button> in application components).
Use <Switch>/<Match> for mutually exclusive conditional rendering instead of multiple adjacent <Show> blocks.
Use <For each={...}> for list rendering.
Use the useToast context for user feedback (success/error notifications).

i18n

All user-facing strings go through the translation system in frontend/src/i18n/fr.ts. Use the t() function:

import { t } from '~/i18n/fr';
// ...
<p>{t('settings.saved')}</p>

Never hardcode French strings directly in JSX.

TypeScript

tsconfig.json has strict: true. No escape hatches.
Domain types live in frontend/src/types.ts. Import them from there.
API clients use generics for type safety (get<T>, post<T>, etc.).
Use the isApiError type guard from types.ts in catch blocks.

Import Conventions

All imports use the ~/ alias (configured in Vite). No relative path imports across directories.

Common Patterns

Adding a New Setting

Follow this sequence when adding a new user-configurable setting:

Migration: Create a new SQL migration in backend/migrations/ that adds the column with a default value:
```
ALTER TABLE user_settings ADD COLUMN my_new_setting INTEGER NOT NULL DEFAULT 5;
```
Naming: YYYYMMDD00000N_add_my_new_setting.sql
Model (backend/src/models/settings.rs): Add the field to both UserSettings and UpdateSettingsRequest. Add validation in UpdateSettingsRequest::validate().
DB (backend/src/db/settings.rs): Update the get_or_create_default and update queries to include the new column.
Frontend types (frontend/src/types.ts): Add the field to UserSettings and UpdateSettingsPayload. Also update DEFAULT_SETTINGS in Settings.tsx.
i18n (frontend/src/i18n/fr.ts): Add translation keys for the label, description, and any validation messages.
Settings UI (frontend/src/pages/Settings.tsx): Add the form control. Use the appropriate input type (number, checkbox, select, etc.).
Important: The PUT /settings endpoint requires the complete settings payload (not a partial update). The frontend must always send all fields. If you add a field, update the DEFAULT_SETTINGS object to include it with a sensible default.

Adding a New API Endpoint

Handler (backend/src/handlers/): Create the handler function. Use AuthUser or AdminUser extractors as needed. Return Result<impl IntoResponse, AppError>.
Router (backend/src/router.rs): Register the route. Place it in the correct section (public, authenticated, admin). Watch for path parameter conflicts -- more specific routes must be registered before generic {id} routes.
Integration tests (backend/tests/): Write tests covering:
- Happy path (200/201/204)
- Auth required (401 without session)
- Validation errors (422 for bad input)
- Not found (404 for missing resources)
- Ownership isolation (user A cannot access user B's resources)
- Admin-only access (403 for non-admin if applicable)
Frontend: Add the API client function in the appropriate frontend/src/api/ file. Add TypeScript types if needed.

Adding a New LLM Provider

The LlmProvider trait (in backend/src/services/llm/mod.rs) defines the contract:

#[async_trait]
pub trait LlmProvider: Send + Sync {
    fn provider_id(&self) -> &str;
    async fn call_llm(&self, model: &str, system_prompt: &str, user_prompt: &str, response_schema: &Value) -> Result<Value, AppError>;
}

Steps:

Implement the trait: Create backend/src/services/llm/my_provider.rs. Implement LlmProvider. Use map_provider_http_error() from llm/mod.rs for HTTP status mapping.
Register in the module: Add pub mod my_provider; to backend/src/services/llm/mod.rs.
Add to the factory (backend/src/services/llm/factory.rs): Add a match arm in create_provider():
```
"my_provider" => Ok(Arc::new(MyProvider::new(api_key, http_client))),
```
Add factory tests: Test that create_provider("my_provider", ...) returns the correct provider.
Admin setup: The admin must add the provider via the admin UI (/admin/providers) with its available models before users can select it.

No changes to the pipeline are needed -- it uses the LlmProvider trait polymorphically.

Git Workflow

Commit Messages

Follow the conventional commits format used in this project:

type: short description

Longer explanation if needed.

Types: feat, fix, docs, refactor, test, chore.

Examples from the repo:

fix: rewrite pass schema uses actual scraped item counts, not max setting
fix: filter empty scraped articles + restore URLs after rewrite + E2E assertions
docs: add spec and plan for source priority pipeline redesign

Rules

Never force push to master.
Create feature branches for non-trivial changes.
Keep commits focused -- one logical change per commit.

Common Pitfalls

Drop Deadlock in Tests

The TestApp struct in backend/tests/common/mod.rs uses a Drop implementation that spawns a background thread to clean up the test database. Do not call .join() on this thread -- it causes a deadlock because the spawned thread's block_on conflicts with the existing tokio runtime's connection pool.

The Drop implementation fires and forgets the cleanup thread intentionally. For explicit cleanup, call app.cleanup().await at the end of the test instead.

SSRF Bypass Environment Variable

The SKIP_SSRF_CHECK=1 environment variable disables all SSRF protection in the scraper. It exists for integration tests (which use wiremock on localhost). Never set this in production. The scripts/run-integration-tests.sh script sets it automatically.

Settings Payload Completeness

The PUT /settings endpoint requires the complete settings object, not a partial update. If you send a payload missing a field, the request will fail with a deserialization error. When writing integration tests, always include every field in the settings JSON. When adding a new setting field, update all existing test payloads.

Pipeline Test Requirements

Pipeline integration tests require:

A running Postgres instance (via TEST_DATABASE_URL)
SKIP_SSRF_CHECK=1 (to allow wiremock on localhost)
Wiremock for mocking HTTP responses from source websites
MockLlmProvider for deterministic LLM responses

The mock provider identifies call types by inspecting the system prompt content (e.g., looking for French keywords like "classer"). If you change prompt wording, the mock may need updating.

Gemini API Key in URL

The Gemini provider places the API key in the URL query string (?key=...). The error handler avoids logging the full URL, but intermediary proxies or debug-level logging could expose it. Be aware of this when configuring logging levels.

14 KiB Raw Permalink Blame History