You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1458 lines
52 KiB
Markdown
1458 lines
52 KiB
Markdown
# Technical Architecture Analysis: AI Weekly Synth Refactoring
|
|
|
|
## Open Questions and Clarifications Needed
|
|
|
|
Before implementation, the following points require decisions from stakeholders:
|
|
|
|
1. **Admin scope**: Is the "admin" a single super-user defined by config, or a full role-based system with multiple admins? This analysis assumes a simple role flag on users plus a single bootstrap admin defined via environment variable.
|
|
|
|
2. **Google OAuth retention**: The requirements specify email+captcha and magic link auth. Should Google SSO be dropped entirely, or kept as an additional option? This analysis assumes Google SSO is dropped to remove all Google dependencies.
|
|
|
|
3. **Email sending for syntheses**: The current app sends syntheses via Gmail API with OAuth popup. With Google dependencies removed, should SMTP-based email sending replace this? This analysis assumes yes, using the same SMTP configuration as magic link delivery.
|
|
|
|
4. **Data migration volume**: How many existing users and syntheses need migrating? This impacts whether a one-shot script suffices or whether incremental migration tooling is needed.
|
|
|
|
5. **Concurrent users target**: Rate limiter design and session store choice depend on expected load. This analysis assumes a small-to-medium deployment (1-100 concurrent users).
|
|
|
|
6. **Legacy data**: The current `SynthesisData` has legacy fields (`majorAnnouncements`, `financialSector`, etc.). The requirements say "remove legacy data/formats/code." This analysis assumes legacy fields are dropped during migration; only the `sections[]` format is carried forward.
|
|
|
|
---
|
|
|
|
## 1. Rust Backend Architecture
|
|
|
|
### 1.1 Framework Choice: Axum
|
|
|
|
**Recommendation: Axum** over Actix-web.
|
|
|
|
**Justification:**
|
|
|
|
| Criterion | Axum | Actix-web |
|
|
|---|---|---|
|
|
| Ecosystem alignment | Built on `tokio` + `tower` + `hyper` -- the de-facto Rust async stack | Has its own runtime layer (though uses tokio underneath) |
|
|
| Middleware model | Tower `Layer`/`Service` -- composable, reusable, testable | Actor-based middleware -- powerful but idiosyncratic |
|
|
| Extractors | Type-safe, ergonomic, uses `FromRequest` traits | Similar, but with `web::Data`, `web::Json` wrappers |
|
|
| Community trajectory | Growing faster, backed by the tokio team | Mature, stable, but slower growth |
|
|
| Learning curve | Lower for developers already using tokio ecosystem | Slightly higher due to actor concepts |
|
|
| Compile-time type safety | Strong -- handler function signatures are validated at compile time | Strong, but less ergonomic error messages |
|
|
|
|
Axum's tower-based middleware model is a decisive advantage for this project: the auth middleware, rate limiter, and CORS layer compose naturally as tower `Layer`s. Axum also has first-class support for shared state via `State` extractor, which maps well to a shared database pool and configuration.
|
|
|
|
### 1.2 Project Structure
|
|
|
|
```
|
|
ai-synth-backend/
|
|
├── Cargo.toml
|
|
├── Cargo.lock
|
|
├── .env.example
|
|
├── migrations/ # sqlx migrations
|
|
│ ├── 001_create_users.sql
|
|
│ ├── 002_create_sessions.sql
|
|
│ ├── 003_create_settings.sql
|
|
│ ├── 004_create_sources.sql
|
|
│ ├── 005_create_syntheses.sql
|
|
│ ├── 006_create_admin_config.sql
|
|
│ └── 007_create_rate_limits.sql
|
|
├── src/
|
|
│ ├── main.rs # Entry point: init tracing, DB, run server
|
|
│ ├── config.rs # Env-based configuration (envy / dotenvy)
|
|
│ ├── app_state.rs # AppState struct (pool, config, http client)
|
|
│ ├── error.rs # AppError enum, IntoResponse impl
|
|
│ ├── router.rs # All route definitions, middleware wiring
|
|
│ ├── middleware/
|
|
│ │ ├── mod.rs
|
|
│ │ ├── auth.rs # Session cookie extraction, user injection
|
|
│ │ ├── csrf.rs # Double-submit cookie CSRF protection
|
|
│ │ └── rate_limit.rs # Per-provider, configurable rate limiter
|
|
│ ├── models/
|
|
│ │ ├── mod.rs
|
|
│ │ ├── user.rs # User, NewUser, UserRole
|
|
│ │ ├── session.rs # Session
|
|
│ │ ├── settings.rs # UserSettings
|
|
│ │ ├── source.rs # Source
|
|
│ │ ├── synthesis.rs # Synthesis, NewsSection, NewsItem
|
|
│ │ └── admin.rs # LlmProviderConfig, RateLimitConfig
|
|
│ ├── handlers/
|
|
│ │ ├── mod.rs
|
|
│ │ ├── auth.rs # register, login (magic link), verify, logout
|
|
│ │ ├── syntheses.rs # list, get, create (trigger generation), delete
|
|
│ │ ├── sources.rs # CRUD, bulk import, CSV export
|
|
│ │ ├── settings.rs # get, update, export, import
|
|
│ │ ├── admin.rs # LLM config CRUD, rate limit config, user list
|
|
│ │ └── email.rs # Send synthesis by email
|
|
│ ├── services/
|
|
│ │ ├── mod.rs
|
|
│ │ ├── llm/
|
|
│ │ │ ├── mod.rs # LlmProvider trait, factory function
|
|
│ │ │ ├── gemini.rs # Google Gemini implementation
|
|
│ │ │ ├── openai.rs # OpenAI implementation
|
|
│ │ │ ├── anthropic.rs # Anthropic implementation
|
|
│ │ │ └── types.rs # Shared request/response types
|
|
│ │ ├── synthesis.rs # 2-pass generation pipeline orchestration
|
|
│ │ ├── scraper.rs # URL validation, HTML scraping, date extraction
|
|
│ │ ├── email.rs # SMTP email sending (magic links + syntheses)
|
|
│ │ └── captcha.rs # Captcha verification
|
|
│ └── db/
|
|
│ ├── mod.rs
|
|
│ ├── users.rs # User queries
|
|
│ ├── sessions.rs # Session queries
|
|
│ ├── settings.rs # Settings queries
|
|
│ ├── sources.rs # Source queries
|
|
│ ├── syntheses.rs # Synthesis queries
|
|
│ └── admin.rs # Admin config queries
|
|
└── tests/
|
|
├── api/ # Integration tests
|
|
└── services/ # Unit tests for services
|
|
```
|
|
|
|
### 1.3 Layered Architecture
|
|
|
|
The application follows a clean 3-layer architecture:
|
|
|
|
- **Handlers** (HTTP layer): Extract request data, call services, return responses. No business logic.
|
|
- **Services** (Business layer): Orchestrate operations, enforce business rules, call DB and external APIs.
|
|
- **DB** (Persistence layer): Raw sqlx queries, mapping to/from model structs.
|
|
|
|
### 1.4 Error Handling
|
|
|
|
A unified `AppError` enum implements `IntoResponse`:
|
|
|
|
```rust
|
|
#[derive(Debug)]
|
|
pub enum AppError {
|
|
// Client errors
|
|
BadRequest(String),
|
|
Unauthorized(String),
|
|
Forbidden(String),
|
|
NotFound(String),
|
|
Conflict(String),
|
|
TooManyRequests { retry_after_secs: u64 },
|
|
ValidationError(Vec<FieldError>),
|
|
|
|
// Server errors
|
|
Internal(anyhow::Error),
|
|
LlmError(String),
|
|
SmtpError(String),
|
|
ScrapingError(String),
|
|
}
|
|
|
|
impl IntoResponse for AppError {
|
|
fn into_response(self) -> axum::response::Response {
|
|
let (status, message) = match &self {
|
|
AppError::BadRequest(msg) => (StatusCode::BAD_REQUEST, msg.clone()),
|
|
AppError::Unauthorized(_) => (StatusCode::UNAUTHORIZED, "Unauthorized".into()),
|
|
AppError::Forbidden(_) => (StatusCode::FORBIDDEN, "Forbidden".into()),
|
|
AppError::NotFound(msg) => (StatusCode::NOT_FOUND, msg.clone()),
|
|
AppError::TooManyRequests { retry_after_secs } => {
|
|
// Include Retry-After header
|
|
(StatusCode::TOO_MANY_REQUESTS, format!("Retry after {retry_after_secs}s"))
|
|
}
|
|
AppError::Internal(e) => {
|
|
tracing::error!("Internal error: {e:#}");
|
|
(StatusCode::INTERNAL_SERVER_ERROR, "Internal server error".into())
|
|
}
|
|
// ...
|
|
};
|
|
(status, Json(json!({ "error": message }))).into_response()
|
|
}
|
|
}
|
|
```
|
|
|
|
All handlers return `Result<impl IntoResponse, AppError>`. The `?` operator propagates errors naturally. `From` implementations convert `sqlx::Error`, `reqwest::Error`, etc. into `AppError`.
|
|
|
|
### 1.5 SQLite with sqlx: Schema Design
|
|
|
|
All tables use TEXT primary keys (UUIDs generated by the backend) for portability. Timestamps are stored as `TEXT` in ISO 8601 format (SQLite has no native timestamp; this also works on Postgres via `TIMESTAMPTZ` cast).
|
|
|
|
#### Migration 001: Users
|
|
|
|
```sql
|
|
CREATE TABLE users (
|
|
id TEXT PRIMARY KEY, -- UUID
|
|
email TEXT NOT NULL UNIQUE,
|
|
display_name TEXT,
|
|
role TEXT NOT NULL DEFAULT 'user', -- 'user' | 'admin'
|
|
created_at TEXT NOT NULL, -- ISO 8601
|
|
updated_at TEXT NOT NULL
|
|
);
|
|
CREATE INDEX idx_users_email ON users(email);
|
|
```
|
|
|
|
#### Migration 002: Sessions
|
|
|
|
```sql
|
|
CREATE TABLE sessions (
|
|
id TEXT PRIMARY KEY, -- Secure random token (32 bytes, base64url)
|
|
user_id TEXT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
|
|
created_at TEXT NOT NULL,
|
|
expires_at TEXT NOT NULL,
|
|
ip_address TEXT,
|
|
user_agent TEXT
|
|
);
|
|
CREATE INDEX idx_sessions_user_id ON sessions(user_id);
|
|
CREATE INDEX idx_sessions_expires_at ON sessions(expires_at);
|
|
```
|
|
|
|
#### Migration 003: Settings
|
|
|
|
```sql
|
|
CREATE TABLE settings (
|
|
user_id TEXT PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE,
|
|
theme TEXT NOT NULL DEFAULT 'Intelligence Artificielle',
|
|
max_age_days INTEGER NOT NULL DEFAULT 7,
|
|
categories TEXT NOT NULL, -- JSON array stored as TEXT
|
|
max_items_per_category INTEGER NOT NULL DEFAULT 4,
|
|
search_agent_behavior TEXT NOT NULL DEFAULT '',
|
|
ai_model TEXT NOT NULL DEFAULT 'gemini-3.1-pro-preview',
|
|
updated_at TEXT NOT NULL
|
|
);
|
|
```
|
|
|
|
#### Migration 004: Sources
|
|
|
|
```sql
|
|
CREATE TABLE sources (
|
|
id TEXT PRIMARY KEY,
|
|
user_id TEXT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
|
|
title TEXT NOT NULL,
|
|
url TEXT NOT NULL,
|
|
created_at TEXT NOT NULL
|
|
);
|
|
CREATE INDEX idx_sources_user_id ON sources(user_id);
|
|
```
|
|
|
|
#### Migration 005: Syntheses
|
|
|
|
```sql
|
|
CREATE TABLE syntheses (
|
|
id TEXT PRIMARY KEY,
|
|
user_id TEXT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
|
|
week TEXT NOT NULL, -- e.g. "2026-W12"
|
|
sections TEXT NOT NULL, -- JSON: [{ title, items: [{ title, url, summary }] }]
|
|
created_at TEXT NOT NULL
|
|
);
|
|
CREATE INDEX idx_syntheses_user_id ON syntheses(user_id);
|
|
CREATE INDEX idx_syntheses_created_at ON syntheses(created_at);
|
|
```
|
|
|
|
#### Migration 006: Admin Config (LLM Providers)
|
|
|
|
```sql
|
|
CREATE TABLE llm_providers (
|
|
id TEXT PRIMARY KEY,
|
|
provider TEXT NOT NULL, -- 'gemini' | 'openai' | 'anthropic'
|
|
display_name TEXT NOT NULL,
|
|
api_key TEXT NOT NULL, -- Encrypted at rest (AES-256-GCM)
|
|
base_url TEXT, -- Optional override for self-hosted/proxy
|
|
models TEXT NOT NULL, -- JSON array of available model identifiers
|
|
is_enabled BOOLEAN NOT NULL DEFAULT 1,
|
|
created_at TEXT NOT NULL,
|
|
updated_at TEXT NOT NULL,
|
|
UNIQUE(provider)
|
|
);
|
|
```
|
|
|
|
#### Migration 007: Rate Limit Configuration
|
|
|
|
```sql
|
|
CREATE TABLE rate_limits (
|
|
id TEXT PRIMARY KEY,
|
|
provider_id TEXT NOT NULL REFERENCES llm_providers(id) ON DELETE CASCADE,
|
|
max_requests INTEGER NOT NULL DEFAULT 29,
|
|
time_window_ms INTEGER NOT NULL DEFAULT 60000,
|
|
updated_at TEXT NOT NULL,
|
|
UNIQUE(provider_id)
|
|
);
|
|
|
|
-- Magic link rate limiting
|
|
CREATE TABLE magic_link_tokens (
|
|
id TEXT PRIMARY KEY,
|
|
email TEXT NOT NULL,
|
|
token_hash TEXT NOT NULL, -- SHA-256 of the token
|
|
created_at TEXT NOT NULL,
|
|
expires_at TEXT NOT NULL,
|
|
used BOOLEAN NOT NULL DEFAULT 0
|
|
);
|
|
CREATE INDEX idx_magic_link_email ON magic_link_tokens(email);
|
|
```
|
|
|
|
### 1.6 SQLite/Postgres Dual Compatibility Strategy
|
|
|
|
**Recommendation: Use sqlx with runtime database selection via `sqlx::AnyPool`.**
|
|
|
|
However, `AnyPool` has limitations (no compile-time query checking). A more robust approach:
|
|
|
|
**Strategy: Feature-flag based conditional compilation.**
|
|
|
|
```toml
|
|
# Cargo.toml
|
|
[features]
|
|
default = ["sqlite"]
|
|
sqlite = ["sqlx/sqlite"]
|
|
postgres = ["sqlx/postgres"]
|
|
```
|
|
|
|
For this project, the SQL differences between SQLite and Postgres are minimal:
|
|
|
|
| Concern | SQLite | Postgres | Resolution |
|
|
|---|---|---|---|
|
|
| Auto-increment PK | `INTEGER PRIMARY KEY` | `SERIAL` | Use UUID TEXT PKs -- identical on both |
|
|
| Timestamps | `TEXT` (ISO 8601) | `TIMESTAMPTZ` | Store as TEXT on both; parse in application layer |
|
|
| JSON columns | `TEXT` + app-side JSON parse | `JSONB` | Store as TEXT on both; Postgres can migrate to JSONB later |
|
|
| Boolean | `INTEGER` (0/1) | `BOOLEAN` | Use `INTEGER` on SQLite, `BOOLEAN` on Postgres; sqlx handles mapping |
|
|
| RETURNING clause | Supported since SQLite 3.35 | Supported | Use `RETURNING` on both |
|
|
|
|
**Practical approach for v1**: Target SQLite only. Write SQL that is Postgres-compatible by design (UUID text PKs, ISO timestamps, no SQLite-specific functions). When the Postgres upgrade happens, create a parallel `migrations_pg/` folder and swap the connection pool. The query layer (db/) remains identical because all queries use standard SQL.
|
|
|
|
Compile-time checking is preserved by using `sqlx::query!` and `sqlx::query_as!` macros with the `DATABASE_URL` environment variable pointing to an SQLite file during development.
|
|
|
|
---
|
|
|
|
## 2. API Design
|
|
|
|
### 2.1 REST API Endpoints
|
|
|
|
All endpoints prefixed with `/api/v1`. Request and response bodies are JSON unless stated otherwise.
|
|
|
|
#### Authentication
|
|
|
|
| Method | Path | Auth | Description |
|
|
|---|---|---|---|
|
|
| `POST` | `/auth/register` | No | Create account (email + captcha) |
|
|
| `POST` | `/auth/login` | No | Request magic link (email + captcha) |
|
|
| `GET` | `/auth/verify?token=...` | No | Verify magic link token, create session |
|
|
| `POST` | `/auth/logout` | Yes | Invalidate session |
|
|
| `GET` | `/auth/me` | Yes | Get current user info |
|
|
|
|
#### Syntheses
|
|
|
|
| Method | Path | Auth | Description |
|
|
|---|---|---|---|
|
|
| `GET` | `/syntheses` | Yes | List user's syntheses (paginated) |
|
|
| `GET` | `/syntheses/:id` | Yes | Get synthesis detail |
|
|
| `POST` | `/syntheses/generate` | Yes | Trigger generation (async, returns job ID) |
|
|
| `GET` | `/syntheses/generate/:job_id/status` | Yes | Poll generation status |
|
|
| `DELETE` | `/syntheses/:id` | Yes | Delete a synthesis |
|
|
| `POST` | `/syntheses/:id/email` | Yes | Send synthesis by email |
|
|
|
|
#### Sources
|
|
|
|
| Method | Path | Auth | Description |
|
|
|---|---|---|---|
|
|
| `GET` | `/sources` | Yes | List user's sources |
|
|
| `POST` | `/sources` | Yes | Add a source |
|
|
| `DELETE` | `/sources/:id` | Yes | Delete a source |
|
|
| `POST` | `/sources/bulk` | Yes | Bulk import (JSON array) |
|
|
| `POST` | `/sources/import-csv` | Yes | Import from CSV (multipart upload) |
|
|
| `GET` | `/sources/export-csv` | Yes | Export as CSV download |
|
|
|
|
#### Settings
|
|
|
|
| Method | Path | Auth | Description |
|
|
|---|---|---|---|
|
|
| `GET` | `/settings` | Yes | Get user's settings |
|
|
| `PUT` | `/settings` | Yes | Update settings |
|
|
| `GET` | `/settings/export` | Yes | Export as JSON download |
|
|
| `POST` | `/settings/import` | Yes | Import from JSON |
|
|
|
|
#### Admin
|
|
|
|
| Method | Path | Auth | Description |
|
|
|---|---|---|---|
|
|
| `GET` | `/admin/providers` | Admin | List LLM provider configs |
|
|
| `POST` | `/admin/providers` | Admin | Add/update provider config |
|
|
| `DELETE` | `/admin/providers/:id` | Admin | Remove provider |
|
|
| `GET` | `/admin/rate-limits` | Admin | Get rate limit configs |
|
|
| `PUT` | `/admin/rate-limits/:provider_id` | Admin | Update rate limit config |
|
|
| `GET` | `/admin/users` | Admin | List all users |
|
|
| `PUT` | `/admin/users/:id/role` | Admin | Change user role |
|
|
|
|
#### Public (for frontend config)
|
|
|
|
| Method | Path | Auth | Description |
|
|
|---|---|---|---|
|
|
| `GET` | `/config/providers` | Yes | List enabled providers + their model names (no API keys) |
|
|
|
|
### 2.2 Request/Response Shapes
|
|
|
|
**POST /auth/register**
|
|
```json
|
|
// Request
|
|
{
|
|
"email": "user@example.com",
|
|
"display_name": "Jane Doe",
|
|
"captcha_token": "hcaptcha-response-token"
|
|
}
|
|
// Response 200
|
|
{
|
|
"message": "A verification link has been sent to your email."
|
|
}
|
|
```
|
|
|
|
**POST /syntheses/generate**
|
|
```json
|
|
// Request (empty body -- uses user's saved settings and sources)
|
|
{}
|
|
// Response 202
|
|
{
|
|
"job_id": "uuid-of-generation-job",
|
|
"status": "pending"
|
|
}
|
|
```
|
|
|
|
**GET /syntheses/:id**
|
|
```json
|
|
// Response 200
|
|
{
|
|
"id": "uuid",
|
|
"week": "2026-W12",
|
|
"created_at": "2026-03-21T10:30:00Z",
|
|
"sections": [
|
|
{
|
|
"title": "Annonces majeures",
|
|
"items": [
|
|
{
|
|
"title": "Article title",
|
|
"url": "https://example.com/article",
|
|
"summary": "4-5 line summary..."
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**PUT /settings**
|
|
```json
|
|
// Request
|
|
{
|
|
"theme": "Intelligence Artificielle",
|
|
"max_age_days": 7,
|
|
"categories": ["Annonces majeures", "Secteur financier"],
|
|
"max_items_per_category": 4,
|
|
"search_agent_behavior": "Custom instructions...",
|
|
"ai_model": "gemini-3.1-pro-preview"
|
|
}
|
|
// Response 200
|
|
{
|
|
"message": "Settings updated successfully."
|
|
}
|
|
```
|
|
|
|
**POST /admin/providers**
|
|
```json
|
|
// Request
|
|
{
|
|
"provider": "openai",
|
|
"display_name": "OpenAI GPT-4o",
|
|
"api_key": "sk-...",
|
|
"base_url": null,
|
|
"models": ["gpt-4o", "gpt-4o-mini"],
|
|
"is_enabled": true
|
|
}
|
|
```
|
|
|
|
### 2.3 Authentication Middleware
|
|
|
|
The auth middleware is a tower `Layer` that:
|
|
|
|
1. Extracts the session cookie (`ai_synth_session`) from the request.
|
|
2. Looks up the session ID in the `sessions` table.
|
|
3. Checks `expires_at` has not passed.
|
|
4. Loads the `User` from the `users` table.
|
|
5. Injects the `User` into request extensions (`request.extensions_mut().insert(user)`).
|
|
6. Handlers extract the user via `Extension<User>` or a custom `AuthUser` extractor.
|
|
|
|
For admin routes, an additional `RequireAdmin` layer checks `user.role == "admin"`.
|
|
|
|
**Session cookies configuration:**
|
|
|
|
```rust
|
|
Cookie::build(("ai_synth_session", session_id))
|
|
.http_only(true)
|
|
.secure(true) // HTTPS only
|
|
.same_site(SameSite::Lax)
|
|
.path("/")
|
|
.max_age(Duration::days(30))
|
|
```
|
|
|
|
**CSRF Protection:**
|
|
|
|
Since this is an API consumed by a SPA on the same origin (or proxied), the combination of `SameSite=Lax` cookies and requiring a custom header (`X-Requested-With: XMLHttpRequest`) on mutating requests provides sufficient CSRF protection. This is the "custom header" pattern -- browsers will not send custom headers on cross-origin requests without CORS preflight approval.
|
|
|
|
For the SPA, every `fetch` call to the API includes:
|
|
```javascript
|
|
headers: { "X-Requested-With": "XMLHttpRequest" }
|
|
```
|
|
|
|
The CSRF middleware rejects `POST/PUT/DELETE` requests missing this header.
|
|
|
|
---
|
|
|
|
## 3. LLM Provider Abstraction
|
|
|
|
### 3.1 Trait Design
|
|
|
|
```rust
|
|
#[async_trait]
|
|
pub trait LlmProvider: Send + Sync {
|
|
/// Returns the provider identifier (e.g., "gemini", "openai", "anthropic").
|
|
fn provider_id(&self) -> &str;
|
|
|
|
/// Pass 1: Search the web and generate structured news items.
|
|
/// Returns raw JSON matching the category schema.
|
|
async fn generate_search_pass(
|
|
&self,
|
|
model: &str,
|
|
system_prompt: &str,
|
|
user_prompt: &str,
|
|
response_schema: &serde_json::Value,
|
|
) -> Result<serde_json::Value, AppError>;
|
|
|
|
/// Pass 2: Rewrite titles and summaries based on scraped content.
|
|
/// No web search tool needed.
|
|
async fn generate_rewrite_pass(
|
|
&self,
|
|
model: &str,
|
|
system_prompt: &str,
|
|
user_prompt: &str,
|
|
response_schema: &serde_json::Value,
|
|
) -> Result<serde_json::Value, AppError>;
|
|
|
|
/// Lists available models for this provider.
|
|
fn available_models(&self) -> &[String];
|
|
}
|
|
```
|
|
|
|
### 3.2 Provider-Specific Web Search Handling
|
|
|
|
Each provider handles web grounding differently. The trait design abstracts this:
|
|
|
|
| Provider | Pass 1 (Search) | Pass 2 (Rewrite) |
|
|
|---|---|---|
|
|
| **Gemini** | Uses `googleSearch` tool in config. Structured output via `responseSchema`. | Standard generation, no tools. `responseSchema` for structured output. |
|
|
| **OpenAI** | Uses `web_search` tool (Responses API) or a two-step approach: first call with `browsing` tool, then structured output. | Standard chat completion with `response_format: { type: "json_schema", ... }`. |
|
|
| **Anthropic** | Uses `web_search` tool (available on Claude models). Structured output via tool-use pattern or explicit JSON instructions. | Standard message with JSON output instructions. Anthropic does not have native JSON schema enforcement, so the prompt includes the schema and parsing is done server-side with validation. |
|
|
|
|
**Implementation details for each provider:**
|
|
|
|
```rust
|
|
// Gemini implementation
|
|
pub struct GeminiProvider {
|
|
client: reqwest::Client,
|
|
api_key: String,
|
|
base_url: String,
|
|
models: Vec<String>,
|
|
}
|
|
|
|
impl GeminiProvider {
|
|
async fn generate_search_pass(&self, model: &str, ...) -> Result<serde_json::Value, AppError> {
|
|
// POST to /v1beta/models/{model}:generateContent
|
|
// Config includes: tools: [{ googleSearch: {} }]
|
|
// responseMimeType: "application/json"
|
|
// responseSchema: <schema>
|
|
}
|
|
}
|
|
|
|
// OpenAI implementation
|
|
pub struct OpenAiProvider {
|
|
client: reqwest::Client,
|
|
api_key: String,
|
|
base_url: String, // default: https://api.openai.com/v1
|
|
models: Vec<String>,
|
|
}
|
|
|
|
// Anthropic implementation
|
|
pub struct AnthropicProvider {
|
|
client: reqwest::Client,
|
|
api_key: String,
|
|
base_url: String, // default: https://api.anthropic.com
|
|
models: Vec<String>,
|
|
}
|
|
```
|
|
|
|
### 3.3 Provider Factory
|
|
|
|
```rust
|
|
pub fn create_provider(config: &LlmProviderConfig) -> Result<Box<dyn LlmProvider>, AppError> {
|
|
match config.provider.as_str() {
|
|
"gemini" => Ok(Box::new(GeminiProvider::new(
|
|
config.api_key.clone(),
|
|
config.base_url.clone(),
|
|
config.models.clone(),
|
|
))),
|
|
"openai" => Ok(Box::new(OpenAiProvider::new(...))),
|
|
"anthropic" => Ok(Box::new(AnthropicProvider::new(...))),
|
|
_ => Err(AppError::BadRequest(format!("Unknown provider: {}", config.provider))),
|
|
}
|
|
}
|
|
```
|
|
|
|
### 3.4 Rate Limiter Design
|
|
|
|
The rate limiter is a server-side, per-provider, in-memory token bucket with configuration stored in the database.
|
|
|
|
```rust
|
|
pub struct RateLimiter {
|
|
state: Arc<DashMap<String, ProviderBucket>>,
|
|
}
|
|
|
|
struct ProviderBucket {
|
|
timestamps: VecDeque<Instant>,
|
|
max_requests: u32,
|
|
time_window: Duration,
|
|
}
|
|
|
|
impl RateLimiter {
|
|
/// Blocks until a slot is available for the given provider.
|
|
pub async fn acquire(&self, provider_id: &str) -> Result<(), AppError> {
|
|
loop {
|
|
let mut bucket = self.state
|
|
.entry(provider_id.to_string())
|
|
.or_insert_with(|| self.default_bucket());
|
|
|
|
bucket.timestamps.retain(|t| t.elapsed() < bucket.time_window);
|
|
|
|
if bucket.timestamps.len() < bucket.max_requests as usize {
|
|
bucket.timestamps.push_back(Instant::now());
|
|
return Ok(());
|
|
}
|
|
|
|
let wait_time = bucket.time_window - bucket.timestamps.front().unwrap().elapsed();
|
|
drop(bucket); // Release the DashMap lock before sleeping
|
|
tokio::time::sleep(wait_time).await;
|
|
}
|
|
}
|
|
|
|
/// Reload configuration from DB (called by admin update endpoint).
|
|
pub async fn reload_config(&self, pool: &SqlitePool) -> Result<(), AppError> {
|
|
// Fetch rate_limits table, update each ProviderBucket
|
|
}
|
|
}
|
|
```
|
|
|
|
The rate limiter lives in `AppState` and is shared across all requests. When an admin updates rate limit configuration, `reload_config` is called to hot-reload without restart.
|
|
|
|
### 3.5 Two-Pass Generation Pipeline
|
|
|
|
The `SynthesisService` orchestrates the full pipeline:
|
|
|
|
```rust
|
|
pub struct SynthesisService;
|
|
|
|
impl SynthesisService {
|
|
pub async fn generate(
|
|
state: &AppState,
|
|
user_id: &str,
|
|
) -> Result<Synthesis, AppError> {
|
|
// 1. Load user settings
|
|
let settings = db::settings::get(pool, user_id).await?;
|
|
|
|
// 2. Load user sources
|
|
let sources = db::sources::list(pool, user_id).await?;
|
|
|
|
// 3. Resolve LLM provider + model
|
|
let (provider, model) = resolve_provider(state, &settings.ai_model).await?;
|
|
|
|
// 4. Build dynamic schema from categories
|
|
let schema = build_category_schema(&settings.categories);
|
|
|
|
// 5. Rate limit: acquire slot
|
|
state.rate_limiter.acquire(provider.provider_id()).await?;
|
|
|
|
// 6. Pass 1: Search
|
|
let raw_results = provider.generate_search_pass(
|
|
&model, &system_prompt, &user_prompt, &schema
|
|
).await?;
|
|
|
|
// 7. Validate & scrape URLs (server-side, no CORS issues)
|
|
let scraped = scraper::validate_and_scrape(
|
|
&state.http_client,
|
|
raw_results,
|
|
settings.max_age_days,
|
|
).await;
|
|
|
|
// 8. Rate limit: acquire slot for pass 2
|
|
state.rate_limiter.acquire(provider.provider_id()).await?;
|
|
|
|
// 9. Pass 2: Rewrite with scraped content
|
|
let final_results = provider.generate_rewrite_pass(
|
|
&model, &rewrite_system_prompt, &rewrite_prompt, &schema
|
|
).await?;
|
|
|
|
// 10. Persist
|
|
let synthesis = db::syntheses::create(
|
|
pool, user_id, &week_string, &final_results
|
|
).await?;
|
|
|
|
Ok(synthesis)
|
|
}
|
|
}
|
|
```
|
|
|
|
### 3.6 Asynchronous Generation
|
|
|
|
Synthesis generation can take 30-90 seconds. Two options:
|
|
|
|
**Option A: Synchronous with long timeout.** Simple, but ties up a connection. Acceptable for low-traffic deployments.
|
|
|
|
**Option B (Recommended): Background task with polling.** The `POST /syntheses/generate` endpoint spawns a tokio task and returns a job ID. The frontend polls `GET /syntheses/generate/:job_id/status`. Job state is kept in an in-memory `DashMap<String, JobStatus>` (not in DB, since jobs are ephemeral).
|
|
|
|
```rust
|
|
enum JobStatus {
|
|
Pending,
|
|
InProgress { step: String }, // "search", "scraping", "rewriting"
|
|
Completed { synthesis_id: String },
|
|
Failed { error: String },
|
|
}
|
|
```
|
|
|
|
The frontend polls every 3-5 seconds with the same loading UX as the current React app.
|
|
|
|
---
|
|
|
|
## 4. URL Scraping / Validation
|
|
|
|
### 4.1 CORS Elimination
|
|
|
|
Moving scraping to the backend **completely eliminates CORS issues**. The Rust backend makes direct HTTP requests to target URLs -- no proxies needed. This is the single biggest reliability improvement in the refactoring.
|
|
|
|
### 4.2 reqwest-Based HTTP Client
|
|
|
|
```rust
|
|
let client = reqwest::Client::builder()
|
|
.user_agent("Mozilla/5.0 (compatible; AISynthBot/1.0; +https://your-domain.com/bot)")
|
|
.timeout(Duration::from_secs(15))
|
|
.redirect(reqwest::redirect::Policy::limited(5))
|
|
.connect_timeout(Duration::from_secs(5))
|
|
.danger_accept_invalid_certs(false)
|
|
.build()?;
|
|
```
|
|
|
|
The HTTP client is created once in `AppState` and reused across all requests (connection pooling).
|
|
|
|
### 4.3 HTML Parsing with `scraper` Crate
|
|
|
|
The current app uses the browser's `DOMParser`. The Rust equivalent uses the `scraper` crate (built on `html5ever`):
|
|
|
|
```rust
|
|
use scraper::{Html, Selector};
|
|
|
|
pub async fn validate_and_scrape(
|
|
client: &reqwest::Client,
|
|
items: Vec<RawNewsItem>,
|
|
max_age_days: i64,
|
|
) -> Vec<ScrapedNewsItem> {
|
|
let futures = items.into_iter().map(|item| {
|
|
let client = client.clone();
|
|
async move { scrape_single(&client, item, max_age_days).await }
|
|
});
|
|
|
|
let results = futures::future::join_all(futures).await;
|
|
results.into_iter().filter_map(|r| r).collect()
|
|
}
|
|
|
|
async fn scrape_single(
|
|
client: &reqwest::Client,
|
|
item: RawNewsItem,
|
|
max_age_days: i64,
|
|
) -> Option<ScrapedNewsItem> {
|
|
// 1. Validate URL format
|
|
let url = Url::parse(&item.url).ok()?;
|
|
|
|
// 2. Fetch
|
|
let resp = client.get(url).send().await.ok()?;
|
|
if !resp.status().is_success() { return None; }
|
|
let html_text = resp.text().await.ok()?;
|
|
|
|
// 3. Parse HTML
|
|
let document = Html::parse_document(&html_text);
|
|
|
|
// 4. Soft-404 detection
|
|
let title_sel = Selector::parse("title").unwrap();
|
|
let h1_sel = Selector::parse("h1").unwrap();
|
|
let title_text = document.select(&title_sel).next()
|
|
.map(|el| el.text().collect::<String>().to_lowercase())
|
|
.unwrap_or_default();
|
|
let h1_text = document.select(&h1_sel).next()
|
|
.map(|el| el.text().collect::<String>().to_lowercase())
|
|
.unwrap_or_default();
|
|
|
|
let error_keywords = [
|
|
"page not found", "404", "403", "access denied",
|
|
"forbidden", "not found", "introuvable",
|
|
];
|
|
if error_keywords.iter().any(|kw| title_text.contains(kw) || h1_text.contains(kw)) {
|
|
return None;
|
|
}
|
|
|
|
// 5. Date extraction (meta tags, JSON-LD, <time>)
|
|
if let Some(pub_date) = extract_publication_date(&document) {
|
|
let age = Utc::now() - pub_date;
|
|
if age.num_days() > max_age_days {
|
|
return None;
|
|
}
|
|
}
|
|
|
|
// 6. Extract body text (remove script, style, nav, etc.)
|
|
let content = extract_body_text(&document, 4000);
|
|
|
|
Some(ScrapedNewsItem {
|
|
title: item.title,
|
|
url: item.url,
|
|
summary: item.summary,
|
|
scraped_content: content,
|
|
})
|
|
}
|
|
```
|
|
|
|
**Date extraction** mirrors the current logic: check `meta[property="article:published_time"]`, `meta[itemprop="datePublished"]`, `<time datetime>`, and JSON-LD `datePublished`. The `chrono` crate handles date parsing with multiple format attempts.
|
|
|
|
### 4.4 Concurrency Control
|
|
|
|
To avoid overwhelming target sites, scraping runs with bounded concurrency:
|
|
|
|
```rust
|
|
use futures::stream::{self, StreamExt};
|
|
|
|
stream::iter(items)
|
|
.map(|item| scrape_single(&client, item, max_age_days))
|
|
.buffer_unordered(10) // Max 10 concurrent scrapes
|
|
.collect::<Vec<_>>()
|
|
.await
|
|
```
|
|
|
|
---
|
|
|
|
## 5. SolidJS Frontend
|
|
|
|
### 5.1 Build Tooling
|
|
|
|
SolidJS uses Vite natively. The migration is straightforward:
|
|
|
|
```js
|
|
// vite.config.ts
|
|
import { defineConfig } from 'vite';
|
|
import solidPlugin from 'vite-plugin-solid';
|
|
import tailwindcss from '@tailwindcss/vite';
|
|
|
|
export default defineConfig({
|
|
plugins: [solidPlugin(), tailwindcss()],
|
|
server: {
|
|
port: 3000,
|
|
proxy: {
|
|
'/api': 'http://localhost:8080', // Proxy to Rust backend during dev
|
|
},
|
|
},
|
|
build: {
|
|
target: 'esnext',
|
|
},
|
|
});
|
|
```
|
|
|
|
**package.json dependencies:**
|
|
```json
|
|
{
|
|
"dependencies": {
|
|
"solid-js": "^1.9",
|
|
"@solidjs/router": "^0.15",
|
|
"lucide-solid": "^0.450",
|
|
"date-fns": "^4.1"
|
|
},
|
|
"devDependencies": {
|
|
"vite": "^6.2",
|
|
"vite-plugin-solid": "^2.11",
|
|
"@tailwindcss/vite": "^4.1",
|
|
"tailwindcss": "^4.1",
|
|
"typescript": "^5.8"
|
|
}
|
|
}
|
|
```
|
|
|
|
### 5.2 State Management: React to SolidJS Mapping
|
|
|
|
| React Pattern | SolidJS Equivalent | Notes |
|
|
|---|---|---|
|
|
| `useState(value)` | `createSignal(value)` | Returns `[getter, setter]` -- getter is a function call: `count()` |
|
|
| `useEffect(() => {}, [deps])` | `createEffect(() => {})` | Auto-tracks dependencies, no dep array needed |
|
|
| `useContext(Ctx)` | `useContext(Ctx)` | Nearly identical API |
|
|
| `createContext()` | `createContext()` | Same concept |
|
|
| `React.FC<Props>` | `Component<Props>` | `import { Component } from 'solid-js'` |
|
|
| `{items.map(i => ...)}` | `<For each={items()}>{(item) => ...}</For>` | SolidJS uses `<For>` for efficient list rendering |
|
|
| `{condition && <X/>}` | `<Show when={condition()}><X/></Show>` | `<Show>` avoids unnecessary DOM creation |
|
|
| `useNavigate()` | `useNavigate()` | Same API from `@solidjs/router` |
|
|
| `useParams()` | `useParams()` | Same API |
|
|
| `onSnapshot` (realtime) | `createResource` + polling or SSE | SolidJS does not have a Firestore equivalent; use `createResource` for data fetching |
|
|
|
|
### 5.3 Authentication Context Port
|
|
|
|
```tsx
|
|
// src/context/AuthContext.tsx
|
|
import { createContext, useContext, createSignal, createResource, ParentComponent } from 'solid-js';
|
|
|
|
interface User {
|
|
id: string;
|
|
email: string;
|
|
display_name: string | null;
|
|
role: string;
|
|
}
|
|
|
|
interface AuthContextType {
|
|
user: () => User | null | undefined;
|
|
loading: () => boolean;
|
|
logout: () => Promise<void>;
|
|
}
|
|
|
|
const AuthContext = createContext<AuthContextType>();
|
|
|
|
async function fetchCurrentUser(): Promise<User | null> {
|
|
const resp = await fetch('/api/v1/auth/me', {
|
|
headers: { 'X-Requested-With': 'XMLHttpRequest' },
|
|
credentials: 'include',
|
|
});
|
|
if (resp.status === 401) return null;
|
|
if (!resp.ok) throw new Error('Failed to fetch user');
|
|
return resp.json();
|
|
}
|
|
|
|
export const AuthProvider: ParentComponent = (props) => {
|
|
const [user, { refetch }] = createResource(fetchCurrentUser);
|
|
|
|
const logout = async () => {
|
|
await fetch('/api/v1/auth/logout', {
|
|
method: 'POST',
|
|
headers: { 'X-Requested-With': 'XMLHttpRequest' },
|
|
credentials: 'include',
|
|
});
|
|
refetch();
|
|
};
|
|
|
|
return (
|
|
<AuthContext.Provider value={{
|
|
user: () => user(),
|
|
loading: () => user.loading,
|
|
logout,
|
|
}}>
|
|
{props.children}
|
|
</AuthContext.Provider>
|
|
);
|
|
};
|
|
|
|
export const useAuth = () => {
|
|
const ctx = useContext(AuthContext);
|
|
if (!ctx) throw new Error('useAuth must be used within AuthProvider');
|
|
return ctx;
|
|
};
|
|
```
|
|
|
|
### 5.4 Data Fetching Pattern
|
|
|
|
The current React app uses Firestore's `onSnapshot` for real-time updates. With the REST API backend, data fetching uses `createResource`:
|
|
|
|
```tsx
|
|
// src/pages/Home.tsx
|
|
import { createResource, For, Show } from 'solid-js';
|
|
import { A } from '@solidjs/router';
|
|
import { fetchApi } from '../lib/api';
|
|
|
|
async function fetchSyntheses() {
|
|
return fetchApi<SynthesisDocument[]>('/api/v1/syntheses');
|
|
}
|
|
|
|
export default function Home() {
|
|
const [syntheses, { refetch }] = createResource(fetchSyntheses);
|
|
|
|
return (
|
|
<Show when={!syntheses.loading} fallback={<Spinner />}>
|
|
<For each={syntheses()}>
|
|
{(synth) => (
|
|
<A href={`/synthesis/${synth.id}`}>
|
|
{/* card content */}
|
|
</A>
|
|
)}
|
|
</For>
|
|
</Show>
|
|
);
|
|
}
|
|
```
|
|
|
|
### 5.5 Tailwind CSS Compatibility
|
|
|
|
Tailwind CSS v4 works identically with SolidJS. The `@tailwindcss/vite` plugin scans `.tsx` files for class names regardless of framework. All existing Tailwind classes carry over without changes. The `lucide-solid` package provides the same icon components as `lucide-react` with identical APIs.
|
|
|
|
### 5.6 Routing
|
|
|
|
```tsx
|
|
// src/App.tsx
|
|
import { Router, Route } from '@solidjs/router';
|
|
import { AuthProvider } from './context/AuthContext';
|
|
|
|
function App() {
|
|
return (
|
|
<AuthProvider>
|
|
<Router>
|
|
<Route path="/login" component={Login} />
|
|
<Route path="/" component={ProtectedLayout}>
|
|
<Route path="/" component={Home} />
|
|
<Route path="/sources" component={Sources} />
|
|
<Route path="/settings" component={Settings} />
|
|
<Route path="/generate" component={GenerateSynthesis} />
|
|
<Route path="/synthesis/:id" component={SynthesisDetail} />
|
|
</Route>
|
|
</Router>
|
|
</AuthProvider>
|
|
);
|
|
}
|
|
```
|
|
|
|
The `ProtectedLayout` component checks auth and renders `<Navigate>` if not logged in -- same pattern as the current React `ProtectedRoute` but using SolidJS's `<Navigate>`.
|
|
|
|
---
|
|
|
|
## 6. Authentication System
|
|
|
|
### 6.1 Magic Link Flow
|
|
|
|
```
|
|
User Frontend Backend SMTP Server
|
|
| | | |
|
|
|-- Enter email -------->| | |
|
|
| |-- POST /auth/login --> |
|
|
| | { email, captcha_token } |
|
|
| | |-- verify captcha ->|
|
|
| | |-- generate token |
|
|
| | |-- store hash in DB |
|
|
| | |-- send email ------+-->
|
|
| |<-- 200 "Check email" | |
|
|
| | | |
|
|
|<---- Email arrives (link: /auth/verify?token=xxx) -------------|
|
|
| | | |
|
|
|-- Click link --------->| | |
|
|
| |-- GET /auth/verify?token=xxx --> |
|
|
| | |-- hash token |
|
|
| | |-- lookup in DB |
|
|
| | |-- verify not expired|
|
|
| | |-- mark as used |
|
|
| | |-- create/get user |
|
|
| | |-- create session |
|
|
| |<-- 302 redirect + Set-Cookie |
|
|
|<-- Redirect to / ------| | |
|
|
```
|
|
|
|
**Token generation:**
|
|
- 32 bytes of cryptographically secure random data (`rand::rngs::OsRng`)
|
|
- Base64url encoded for URL safety
|
|
- SHA-256 hash stored in DB (never store raw token)
|
|
- 15-minute expiry
|
|
- Single use (marked `used = true` after verification)
|
|
|
|
**Rate limiting on magic link requests:**
|
|
- Max 3 requests per email per 15 minutes
|
|
- Max 10 requests per IP per hour
|
|
- Prevents email bombing
|
|
|
|
### 6.2 Account Registration Flow
|
|
|
|
1. User submits email + display name + captcha token.
|
|
2. Backend verifies captcha with provider.
|
|
3. Backend checks email uniqueness.
|
|
4. Backend creates user with `role = 'user'` and default settings.
|
|
5. Backend sends magic link email for initial verification.
|
|
6. User clicks link, session is created.
|
|
|
|
The first user can be bootstrapped as admin via environment variable:
|
|
```
|
|
ADMIN_EMAIL=admin@example.com
|
|
```
|
|
On startup, if a user with this email exists, their role is set to `admin`.
|
|
|
|
### 6.3 Session Management
|
|
|
|
Sessions are stored in the `sessions` table. The session ID is a 32-byte random token (base64url-encoded, 43 characters). Session lookup is O(1) via primary key.
|
|
|
|
**Session lifecycle:**
|
|
- Created on magic link verification
|
|
- Expires after 30 days (configurable)
|
|
- Refreshed (expiry extended) on each authenticated request
|
|
- Deleted on logout
|
|
- Periodic cleanup job (tokio interval) removes expired sessions
|
|
|
|
### 6.4 Captcha Integration
|
|
|
|
**Recommendation: Cloudflare Turnstile.**
|
|
|
|
| Option | Self-hostable | Privacy | Free tier |
|
|
|---|---|---|---|
|
|
| hCaptcha | No (SaaS) | Better than reCAPTCHA | Yes (unlimited) |
|
|
| Cloudflare Turnstile | No (SaaS) | Excellent (often invisible) | Yes (unlimited) |
|
|
| mCaptcha | Yes (open source) | Full control | N/A (self-hosted) |
|
|
|
|
None of the mainstream captcha services are fully self-hostable. **Cloudflare Turnstile** is recommended for its invisible challenge mode (better UX) and generous free tier. If strict self-hosting is required, **mCaptcha** (Rust-based, open source) is the only viable option, though it requires running a separate service.
|
|
|
|
Backend verification is simple:
|
|
```rust
|
|
pub async fn verify_captcha(client: &reqwest::Client, token: &str, secret: &str) -> Result<bool, AppError> {
|
|
let resp = client
|
|
.post("https://challenges.cloudflare.com/turnstile/v0/siteverify")
|
|
.form(&[("secret", secret), ("response", token)])
|
|
.send()
|
|
.await?;
|
|
let result: TurnstileResponse = resp.json().await?;
|
|
Ok(result.success)
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 7. Docker Deployment
|
|
|
|
### 7.1 Multi-Stage Dockerfile
|
|
|
|
```dockerfile
|
|
# ===== Stage 1: Build Rust backend =====
|
|
FROM rust:1.85-bookworm AS backend-builder
|
|
|
|
WORKDIR /app
|
|
COPY Cargo.toml Cargo.lock ./
|
|
COPY src/ src/
|
|
COPY migrations/ migrations/
|
|
|
|
# Create a dummy SQLite DB for sqlx compile-time checks
|
|
ENV DATABASE_URL="sqlite:///tmp/build.db"
|
|
RUN cargo install sqlx-cli --no-default-features --features sqlite \
|
|
&& sqlx database create \
|
|
&& sqlx migrate run
|
|
|
|
RUN cargo build --release
|
|
|
|
# ===== Stage 2: Build SolidJS frontend =====
|
|
FROM node:22-alpine AS frontend-builder
|
|
|
|
WORKDIR /app/frontend
|
|
COPY frontend/package.json frontend/package-lock.json ./
|
|
RUN npm ci
|
|
|
|
COPY frontend/ ./
|
|
RUN npm run build
|
|
|
|
# ===== Stage 3: Minimal runtime =====
|
|
FROM debian:bookworm-slim AS runtime
|
|
|
|
RUN apt-get update && apt-get install -y \
|
|
ca-certificates \
|
|
libssl3 \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
RUN useradd -ms /bin/bash appuser
|
|
|
|
WORKDIR /app
|
|
|
|
# Copy backend binary
|
|
COPY --from=backend-builder /app/target/release/ai-synth-backend .
|
|
# Copy migrations for runtime migration
|
|
COPY --from=backend-builder /app/migrations/ migrations/
|
|
# Copy frontend static files
|
|
COPY --from=frontend-builder /app/frontend/dist/ static/
|
|
|
|
# Create data directory for SQLite
|
|
RUN mkdir -p /app/data && chown appuser:appuser /app/data
|
|
|
|
USER appuser
|
|
|
|
ENV DATABASE_URL="sqlite:///app/data/ai_synth.db"
|
|
ENV STATIC_DIR="/app/static"
|
|
ENV PORT=8080
|
|
|
|
EXPOSE 8080
|
|
|
|
# Run migrations on startup, then start server
|
|
CMD ["./ai-synth-backend"]
|
|
```
|
|
|
|
The Rust backend serves the static SolidJS files directly (via `tower-http::ServeDir`), eliminating the need for a separate nginx container. All `/api/*` routes go to handlers; everything else serves `index.html` (SPA fallback).
|
|
|
|
### 7.2 docker-compose.yml
|
|
|
|
```yaml
|
|
version: "3.9"
|
|
|
|
services:
|
|
app:
|
|
build:
|
|
context: .
|
|
dockerfile: Dockerfile
|
|
container_name: ai-synth
|
|
restart: unless-stopped
|
|
ports:
|
|
- "${PORT:-8080}:8080"
|
|
volumes:
|
|
- ai_synth_data:/app/data # SQLite persistence
|
|
environment:
|
|
- DATABASE_URL=sqlite:///app/data/ai_synth.db
|
|
- PORT=8080
|
|
- ADMIN_EMAIL=${ADMIN_EMAIL}
|
|
- SESSION_SECRET=${SESSION_SECRET} # 64-byte hex for cookie signing
|
|
- SMTP_HOST=${SMTP_HOST}
|
|
- SMTP_PORT=${SMTP_PORT:-587}
|
|
- SMTP_USER=${SMTP_USER}
|
|
- SMTP_PASSWORD=${SMTP_PASSWORD}
|
|
- SMTP_FROM=${SMTP_FROM}
|
|
- CAPTCHA_SECRET=${CAPTCHA_SECRET}
|
|
- CAPTCHA_SITE_KEY=${CAPTCHA_SITE_KEY}
|
|
- ENCRYPTION_KEY=${ENCRYPTION_KEY} # 32-byte hex for API key encryption
|
|
- RUST_LOG=info
|
|
healthcheck:
|
|
test: ["CMD", "curl", "-f", "http://localhost:8080/api/v1/health"]
|
|
interval: 30s
|
|
timeout: 5s
|
|
retries: 3
|
|
|
|
# Optional: Mailpit for local development (SMTP catch-all)
|
|
mailpit:
|
|
image: axllent/mailpit
|
|
container_name: ai-synth-mail
|
|
restart: unless-stopped
|
|
ports:
|
|
- "8025:8025" # Web UI
|
|
- "1025:1025" # SMTP
|
|
profiles:
|
|
- dev
|
|
|
|
volumes:
|
|
ai_synth_data:
|
|
driver: local
|
|
```
|
|
|
|
### 7.3 Volume Mounts for SQLite
|
|
|
|
The SQLite database file is stored in a Docker named volume (`ai_synth_data`). This ensures:
|
|
- Data persists across container restarts and rebuilds
|
|
- The volume can be backed up via `docker cp` or volume backup tools
|
|
- WAL mode is used for concurrent read/write performance
|
|
|
|
**Important SQLite configuration for production:**
|
|
```rust
|
|
let pool = SqlitePoolOptions::new()
|
|
.max_connections(5) // SQLite handles limited concurrency
|
|
.after_connect(|conn, _| {
|
|
Box::pin(async move {
|
|
conn.execute("PRAGMA journal_mode=WAL").await?;
|
|
conn.execute("PRAGMA synchronous=NORMAL").await?;
|
|
conn.execute("PRAGMA foreign_keys=ON").await?;
|
|
conn.execute("PRAGMA busy_timeout=5000").await?;
|
|
Ok(())
|
|
})
|
|
})
|
|
.connect(&database_url)
|
|
.await?;
|
|
```
|
|
|
|
### 7.4 Environment Variable Configuration
|
|
|
|
A `.env.example` file documents all required and optional variables:
|
|
|
|
```env
|
|
# === Required ===
|
|
DATABASE_URL=sqlite:///app/data/ai_synth.db
|
|
SESSION_SECRET=<64-byte-hex-string>
|
|
ENCRYPTION_KEY=<32-byte-hex-string>
|
|
ADMIN_EMAIL=admin@example.com
|
|
|
|
# === SMTP (required for magic link auth) ===
|
|
SMTP_HOST=smtp.example.com
|
|
SMTP_PORT=587
|
|
SMTP_USER=user@example.com
|
|
SMTP_PASSWORD=password
|
|
SMTP_FROM=noreply@example.com
|
|
|
|
# === Captcha ===
|
|
CAPTCHA_SECRET=<turnstile-secret-key>
|
|
CAPTCHA_SITE_KEY=<turnstile-site-key>
|
|
|
|
# === Optional ===
|
|
PORT=8080
|
|
RUST_LOG=info
|
|
BASE_URL=https://your-domain.com # For magic link URLs
|
|
```
|
|
|
|
---
|
|
|
|
## 8. Migration from Firebase
|
|
|
|
### 8.1 Data Migration Strategy
|
|
|
|
A standalone Rust CLI tool (or a script using `firebase-admin` SDK in Python/Node) handles the migration:
|
|
|
|
**Step 1: Export Firestore data**
|
|
|
|
Use `firebase-admin` SDK (Python or Node.js is simplest for this one-shot task):
|
|
|
|
```python
|
|
# migrate_export.py
|
|
import firebase_admin
|
|
from firebase_admin import credentials, firestore
|
|
import json
|
|
|
|
cred = credentials.Certificate("service-account.json")
|
|
firebase_admin.initialize_app(cred)
|
|
db = firestore.client()
|
|
|
|
# Export users (from Firebase Auth)
|
|
# Export syntheses, sources, settings collections
|
|
data = {
|
|
"syntheses": [],
|
|
"sources": [],
|
|
"settings": [],
|
|
}
|
|
|
|
for doc in db.collection("syntheses").stream():
|
|
d = doc.to_dict()
|
|
d["_id"] = doc.id
|
|
data["syntheses"].append(d)
|
|
|
|
# ... same for sources, settings
|
|
|
|
with open("firebase_export.json", "w") as f:
|
|
json.dump(data, f, default=str)
|
|
```
|
|
|
|
**Step 2: Transform and import into SQLite**
|
|
|
|
A Rust CLI tool reads the JSON export and inserts into SQLite:
|
|
|
|
```
|
|
cargo run --bin migrate -- --input firebase_export.json --db ai_synth.db
|
|
```
|
|
|
|
Key transformations:
|
|
- `authorUid` / `userId` from Firebase Auth UID -> new UUID in `users` table (mapping table maintained during migration)
|
|
- Firebase `Timestamp` -> ISO 8601 string
|
|
- Legacy `SynthesisData` fields (`majorAnnouncements`, `financialSector`, etc.) -> normalized `sections[]` JSON
|
|
- Settings doc ID (was `{userId}` in Firestore) -> `user_id` foreign key
|
|
|
|
**Step 3: User notification**
|
|
|
|
Since authentication changes from Google SSO to email+magic link, existing users need to be notified that they must use the magic link flow. Their email addresses (from Firebase Auth) are imported into the `users` table. On first magic link login, the user's existing data is accessible via their email.
|
|
|
|
### 8.2 Mapping Firestore Security Rules to Rust
|
|
|
|
The Firestore rules enforce three categories of protection that map to backend patterns:
|
|
|
|
| Firestore Rule | Rust Equivalent |
|
|
|---|---|
|
|
| `isAuthenticated()` | Auth middleware layer (rejects 401 if no valid session) |
|
|
| `isDocOwner()` / `request.auth.uid == resource.data.authorUid` | Query-level filtering: `WHERE user_id = $1` with the authenticated user's ID |
|
|
| `isValidSynthesis()` / `isValidSettings()` / `isValidSource()` | Request validation using `validator` crate or manual checks in handlers |
|
|
| `uidUnchanged()` / `uidNotModified()` | Not applicable -- `user_id` is never in the request body; it is injected server-side from the session |
|
|
| `request.resource.data.createdAt == resource.data.createdAt` | `created_at` is set server-side and never updatable via API |
|
|
| Field type checks (string, number, timestamp) | Serde deserialization + custom validators |
|
|
| Size limits (e.g., `title.size() < 500`) | Validator annotations: `#[validate(length(max = 500))]` |
|
|
|
|
**Example validation in Rust:**
|
|
|
|
```rust
|
|
#[derive(Deserialize, Validate)]
|
|
pub struct CreateSourceRequest {
|
|
#[validate(length(min = 1, max = 200))]
|
|
pub title: String,
|
|
|
|
#[validate(url, length(max = 1000))]
|
|
pub url: String,
|
|
}
|
|
```
|
|
|
|
The key architectural difference: in Firestore, rules are the *only* security layer (the client has direct DB access). In the Rust backend, security is enforced at the handler level (authentication middleware + query scoping + input validation). The database is never directly accessible from the client.
|
|
|
|
**Ownership enforcement pattern:**
|
|
|
|
Every query that reads or mutates user data includes `WHERE user_id = ?` with the authenticated user's ID. This is not a "rule" but a structural guarantee -- there is no code path that can access another user's data because the user ID comes from the session, not the request.
|
|
|
|
```rust
|
|
// db/syntheses.rs
|
|
pub async fn get_by_id(pool: &SqlitePool, user_id: &str, synthesis_id: &str) -> Result<Option<Synthesis>, sqlx::Error> {
|
|
sqlx::query_as!(
|
|
Synthesis,
|
|
"SELECT * FROM syntheses WHERE id = ? AND user_id = ?",
|
|
synthesis_id,
|
|
user_id
|
|
)
|
|
.fetch_optional(pool)
|
|
.await
|
|
}
|
|
```
|
|
|
|
If the synthesis belongs to another user, this returns `None`, and the handler returns 404. There is no way for a user to query, update, or delete another user's data.
|
|
|
|
---
|
|
|
|
## Summary of Key Crate Dependencies
|
|
|
|
| Purpose | Crate | Version Guidance |
|
|
|---|---|---|
|
|
| Web framework | `axum` | ^0.8 |
|
|
| Async runtime | `tokio` | ^1 (full features) |
|
|
| Database | `sqlx` | ^0.8 (features: sqlite, runtime-tokio) |
|
|
| HTTP client | `reqwest` | ^0.12 (features: json, cookies) |
|
|
| HTML parsing | `scraper` | ^0.22 |
|
|
| Serialization | `serde`, `serde_json` | ^1 |
|
|
| Date/time | `chrono` | ^0.4 |
|
|
| Password/token hashing | `sha2` | ^0.10 |
|
|
| Random tokens | `rand` | ^0.8 |
|
|
| SMTP | `lettre` | ^0.11 |
|
|
| Logging | `tracing`, `tracing-subscriber` | ^0.1 / ^0.3 |
|
|
| Config | `dotenvy` | ^0.15 |
|
|
| Validation | `validator` | ^0.19 |
|
|
| Concurrent map | `dashmap` | ^6 |
|
|
| Static file serving | `tower-http` | ^0.6 (features: fs, cors, trace) |
|
|
| Cookie handling | `axum-extra` | ^0.10 (features: cookie) |
|
|
| Encryption (API keys) | `aes-gcm` | ^0.10 |
|
|
| Base64 | `base64` | ^0.22 |
|
|
| UUID | `uuid` | ^1 (features: v4) |
|
|
| Error handling | `anyhow`, `thiserror` | ^1 |
|
|
|
|
---
|
|
|
|
## Architecture Diagram (Text)
|
|
|
|
```
|
|
┌─────────────────────┐
|
|
│ Docker Container │
|
|
│ │
|
|
Browser ◄──── HTTPS ────► ┌─────┴─────────────────┐ │
|
|
(SolidJS SPA) │ Axum Web Server │ │
|
|
│ │ │
|
|
│ /static/* ──► ServeDir│ │
|
|
│ /api/v1/* ──► Router │ │
|
|
│ │ │
|
|
│ ┌─ Auth Middleware ─┐ │ │
|
|
│ │ Session Cookie │ │ │
|
|
│ │ CSRF Check │ │ │
|
|
│ └───────────────────┘ │ │
|
|
│ │ │
|
|
│ ┌─ Handlers ────────┐ │ │
|
|
│ │ auth, syntheses, │ │ │
|
|
│ │ sources, settings,│ │ │
|
|
│ │ admin, email │ │ │
|
|
│ └────────┬──────────┘ │ │
|
|
│ │ │ │
|
|
│ ┌─ Services ────────┐ │ │
|
|
│ │ LLM providers │─┼───┼──► Gemini API
|
|
│ │ (trait-based) │─┼───┼──► OpenAI API
|
|
│ │ │─┼───┼──► Anthropic API
|
|
│ │ Scraper (reqwest) │─┼───┼──► Target URLs
|
|
│ │ Email (lettre) │─┼───┼──► SMTP Server
|
|
│ │ Captcha │─┼───┼──► Turnstile API
|
|
│ └────────┬──────────┘ │ │
|
|
│ │ │ │
|
|
│ ┌─ DB Layer (sqlx) ─┐ │ │
|
|
│ │ SQLite (WAL) │ │ │
|
|
│ └───────────────────┘ │ │
|
|
└───────────┬────────────┘ │
|
|
│ │
|
|
┌───────────▼────────────┐ │
|
|
│ /app/data/ │ │
|
|
│ ai_synth.db │ │
|
|
│ (Docker volume) │ │
|
|
└─────────────────────────┘ │
|
|
└─────────────────────┘
|
|
```
|