You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
243 lines
10 KiB
Markdown
243 lines
10 KiB
Markdown
# Design: Architect Assessment Remediation
|
|
|
|
**Date**: 2026-03-22
|
|
**Source**: docs/architect_assessment.md
|
|
**Scope**: 4 fixes — 2 API contract breaks, 1 SSRF vulnerability, 1 test reliability improvement
|
|
|
|
---
|
|
|
|
## Context
|
|
|
|
An architect assessment identified issues in the current codebase. After exploration, 3 are real issues, 1 is a test quality problem, 2 were already fixed, and 1 is not applicable (in-memory state is intentional for single-tenant design).
|
|
|
|
**Dropped from scope:**
|
|
- Dedicated hardened HTTP client — already implemented (`build_scraper_client()` with timeouts, redirect limits, User-Agent)
|
|
- Streaming body limits — **accepted risk**. The current code downloads the full response into memory via `response.bytes()` then checks `bytes.len() > MAX_BODY_SIZE` (5 MB). A malicious server could exhaust memory before the check runs. True streaming would require a chunked reader with an accumulator cap. Accepted because: reqwest's `.timeout()` (15s) limits the window, and the 5 MB practical limit is enforced. A multi-GB response would hit the timeout before filling memory.
|
|
- Job/rate-limit state backing to Redis/DB — not applicable per decision Q-UX-3 (single-tenant, single-instance)
|
|
|
|
---
|
|
|
|
## Fix 1: `/syntheses` list contract alignment
|
|
|
|
### Problem
|
|
Backend `SynthesisListItem` returns preview metadata (`first_section_title: Option<String>`, `first_section_item_count: usize`). Frontend `SynthesisListItem` expects `sections: NewsSection[]`. The frontend tries to index into `sections[0]` which doesn't exist in the API response.
|
|
|
|
### Solution
|
|
Fix the frontend to align with the backend response.
|
|
|
|
### Backend (no changes)
|
|
The backend design is correct — returning only preview data for list views is the right performance trade-off.
|
|
|
|
### Frontend changes
|
|
|
|
**`frontend/src/types.ts`** — update `SynthesisListItem`:
|
|
```typescript
|
|
export interface SynthesisListItem {
|
|
id: string;
|
|
week: string;
|
|
status: string;
|
|
created_at: string;
|
|
first_section_title: string | null;
|
|
first_section_item_count: number;
|
|
}
|
|
```
|
|
Remove `sections: NewsSection[]` from this type. (The full `Synthesis` type keeps `sections` — it's used by the detail view.)
|
|
|
|
**`frontend/src/pages/Home.tsx`** — update card preview rendering:
|
|
- Replace `synth.sections?.[0]?.items?.length > 0` with `synth.first_section_title !== null`
|
|
- Replace `synth.sections[0].items.slice(0, 4).map(...)` with a simple display of `first_section_title` and `first_section_item_count`
|
|
- Card preview shows: "{first_section_title} ({first_section_item_count} articles)" or the no-preview fallback
|
|
|
|
### Files to modify
|
|
- `frontend/src/types.ts`
|
|
- `frontend/src/pages/Home.tsx`
|
|
|
|
---
|
|
|
|
## Fix 2: Admin rate-limits update path
|
|
|
|
### Problem
|
|
Backend route is `PUT /admin/rate-limits/{provider_name}` (expects a string like "gemini"). Frontend calls `api.put(/admin/rate-limits/${id})` passing a UUID. The request 404s because no provider has a UUID as its name.
|
|
|
|
### Solution
|
|
Fix the frontend to pass `provider_name` instead of `id`.
|
|
|
|
### Backend (no changes)
|
|
The backend design is correct — rate limits are keyed by provider name.
|
|
|
|
### Frontend changes
|
|
|
|
**`frontend/src/api/admin.ts`** — update the rate limits API:
|
|
```typescript
|
|
// Before:
|
|
update: (id: string, data: UpdateRateLimitRequest) => api.put(`/admin/rate-limits/${id}`, data)
|
|
// After:
|
|
update: (providerName: string, data: UpdateRateLimitRequest) => api.put(`/admin/rate-limits/${providerName}`, data)
|
|
```
|
|
|
|
**`frontend/src/pages/admin/RateLimits.tsx`** — update **only the API call** in the save handler to pass `limit.provider_name` instead of `limit.id`. The `limit.id` field continues to be used for UI state management (`savingId`, `updateLocal`) — those do NOT change.
|
|
|
|
### Files to modify
|
|
- `frontend/src/api/admin.ts`
|
|
- `frontend/src/pages/admin/RateLimits.tsx`
|
|
|
|
---
|
|
|
|
## Fix 3: SSRF redirect validation per hop
|
|
|
|
### Problem
|
|
The scraper performs SSRF checks (DNS resolution + `is_private_ip()`) only on the initial URL. Reqwest then follows up to 3 redirects without re-checking. An attacker can host a public domain that 302-redirects to `http://169.254.169.254/` (cloud metadata) or `http://127.0.0.1/`.
|
|
|
|
### Solution
|
|
Replace reqwest's default redirect policy with a custom one that validates each hop.
|
|
|
|
### Implementation
|
|
|
|
**`backend/src/services/scraper.rs`** — update `build_scraper_client()`:
|
|
|
|
Replace `redirect(reqwest::redirect::Policy::limited(3))` with a custom policy:
|
|
|
|
```rust
|
|
use std::net::ToSocketAddrs;
|
|
|
|
let redirect_policy = reqwest::redirect::Policy::custom(|attempt| {
|
|
// Enforce max 3 hops
|
|
if attempt.previous().len() >= 3 {
|
|
return attempt.error("Too many redirects");
|
|
}
|
|
|
|
let url = attempt.url();
|
|
|
|
// Validate scheme — reject non-HTTP redirects
|
|
if url.scheme() != "http" && url.scheme() != "https" {
|
|
return attempt.error("Blocked redirect to non-HTTP scheme");
|
|
}
|
|
|
|
// Resolve DNS and check for private IPs (sync — acceptable in redirect handler)
|
|
if let Some(host) = url.host_str() {
|
|
let port = url.port().unwrap_or(if url.scheme() == "https" { 443 } else { 80 });
|
|
let addr_str = format!("{}:{}", host, port);
|
|
match addr_str.to_socket_addrs() {
|
|
Ok(addrs) => {
|
|
for addr in addrs {
|
|
if is_private_ip(addr.ip()) {
|
|
return attempt.error("Blocked redirect to private/internal IP");
|
|
}
|
|
}
|
|
}
|
|
Err(_) => {
|
|
// DNS resolution failed — block the redirect to prevent TOCTOU/rebinding attacks.
|
|
// If DNS fails here but succeeds when reqwest retries, the resolved IP could
|
|
// be different (DNS rebinding). Safer to fail closed.
|
|
return attempt.error("DNS resolution failed for redirect target");
|
|
}
|
|
}
|
|
}
|
|
|
|
attempt.follow()
|
|
});
|
|
```
|
|
|
|
**Key decisions:**
|
|
- **`attempt.error()` instead of `attempt.stop()`**: `stop()` silently returns the 3xx response (no body, scraper tries to parse garbage). `error()` produces a reqwest error that propagates to the caller as an `AppError`, which is the correct behavior for a blocked SSRF attempt.
|
|
- **DNS failure blocks the redirect**: allowing a redirect when DNS fails opens a TOCTOU window (DNS rebinding). Fail closed.
|
|
|
|
The `is_private_ip()` function is already defined and covers all IPv4/IPv6 private ranges (loopback, private, link-local, ULA, documentation, discard). It's a module-level function, accessible from the redirect policy closure.
|
|
|
|
**Note on sync DNS:** reqwest's redirect policy closure is synchronous. `ToSocketAddrs` performs a blocking DNS lookup. This is acceptable because:
|
|
1. Redirects happen rarely (most URLs don't redirect)
|
|
2. DNS lookups are fast (< 100ms typically)
|
|
3. The alternative (disabling all redirects) breaks legitimate sites
|
|
|
|
**Residual risk (DNS rebinding TOCTOU):** Even with per-hop DNS checks, there is a theoretical TOCTOU gap: the DNS check resolves to a public IP, then reqwest's actual connection resolves to a private IP. Fully preventing this would require a custom reqwest connector that validates IPs at the TCP connect level. This is accepted as a known limitation — the per-hop check raises the bar significantly from the current zero-check state.
|
|
|
|
### Tests
|
|
- The `is_private_ip()` function is already thoroughly tested (13 tests covering all IP ranges)
|
|
- Add a doc comment on `build_scraper_client()` explaining the per-hop SSRF protection
|
|
|
|
### Files to modify
|
|
- `backend/src/services/scraper.rs`
|
|
|
|
---
|
|
|
|
## Fix 4: Shared test fixtures to prevent mock drift
|
|
|
|
### Problem
|
|
Frontend test mocks are hand-crafted inline in each test file. They've already diverged from backend response shapes (e.g., Home.tsx mocks include `sections` which the backend doesn't return in list responses). This divergence means tests pass but the real app would fail.
|
|
|
|
### Solution
|
|
Create a shared fixtures file typed against the actual TypeScript interfaces. All page tests import from it. Type changes in `types.ts` cause compilation failures in the fixture file, catching drift immediately.
|
|
|
|
### Implementation
|
|
|
|
**Create `frontend/src/__tests__/fixtures.ts`**:
|
|
```typescript
|
|
import type {
|
|
SynthesisListItem,
|
|
Synthesis,
|
|
Source,
|
|
UserSettings,
|
|
ProviderConfig,
|
|
AdminProvider,
|
|
AdminRateLimit,
|
|
} from '~/types';
|
|
|
|
// ---- Syntheses ----
|
|
export const MOCK_SYNTHESIS_LIST_ITEM: SynthesisListItem = {
|
|
id: 'test-synth-1',
|
|
week: '2026-W12',
|
|
status: 'completed',
|
|
created_at: '2026-03-21T10:00:00Z',
|
|
first_section_title: 'Annonces majeures',
|
|
first_section_item_count: 3,
|
|
};
|
|
|
|
export const MOCK_SYNTHESIS_LIST: SynthesisListItem[] = [
|
|
MOCK_SYNTHESIS_LIST_ITEM,
|
|
{ ...MOCK_SYNTHESIS_LIST_ITEM, id: 'test-synth-2', week: '2026-W11' },
|
|
];
|
|
|
|
// ---- Sources ----
|
|
// Fill with valid values matching the Source interface
|
|
export const MOCK_SOURCE: Source = { id: 'src-1', user_id: 'u1', title: 'Test Blog', url: 'https://test.example.com/blog', created_at: '2026-03-21T10:00:00Z' };
|
|
|
|
// ---- Settings ----
|
|
// Use DEFAULT_SETTINGS as base, override specific fields
|
|
export const MOCK_SETTINGS: UserSettings = { ...DEFAULT_SETTINGS, theme: 'Test Theme', ai_provider: 'gemini', ai_model: 'gemini-2.5-flash' };
|
|
|
|
// ---- Providers ----
|
|
export const MOCK_PROVIDER_CONFIG: ProviderConfig = { provider_name: 'gemini', display_name: 'Google Gemini', models: [{ model_id: 'gemini-2.5-flash', display_name: 'Gemini 2.5 Flash' }] };
|
|
```
|
|
|
|
**Update all page test files** to import from fixtures instead of defining inline mocks.
|
|
|
|
### Files to modify
|
|
- Create: `frontend/src/__tests__/fixtures.ts`
|
|
- Modify: `frontend/src/__tests__/pages/home.test.tsx`
|
|
- Modify: `frontend/src/__tests__/pages/settings.test.tsx`
|
|
- Modify: `frontend/src/__tests__/pages/sources.test.tsx`
|
|
- Modify: `frontend/src/__tests__/pages/generate.test.tsx`
|
|
|
|
---
|
|
|
|
## Implementation Order
|
|
|
|
1. **Fix 1** (syntheses list contract) — must be first because it changes `SynthesisListItem` type used by fixtures
|
|
2. **Fix 2** (rate-limits path) — independent, quick
|
|
3. **Fix 4** (shared fixtures) — depends on Fix 1 for the corrected type
|
|
4. **Fix 3** (SSRF redirect) — independent, backend-only
|
|
|
|
---
|
|
|
|
## Verification
|
|
|
|
### Backend
|
|
```bash
|
|
cd backend && cargo check && cargo test --lib
|
|
```
|
|
|
|
### Frontend
|
|
```bash
|
|
cd frontend && npx tsc --noEmit && npx vitest run && npx vite build
|
|
```
|