You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
ai_synth/docs/audit/00-consolidated-summary.md

128 lines
6.0 KiB
Markdown

# Code Audit — Consolidated Summary
**Date**: 2026-03-26
**Auditors**: Software Architect, Frontend Expert, Rust Expert, Tech Lead, Devil's Advocate
## Overall Assessment
The codebase is well-structured with strong fundamentals: clean layer separation, zero `unsafe` code, zero `unwrap()` in production paths, comprehensive security measures (AES-256-GCM encryption, SSRF prevention, anti-enumeration timing), and good test coverage (363+ unit tests, 9 integration test files). The code is a learning project that has evolved significantly through rapid iteration.
The main issues cluster around **complexity in the synthesis pipeline** and **accumulated duplication** from feature layering.
---
## P0 — Fix Immediately
These are bugs that can crash or corrupt the application.
| # | Issue | Source | Files |
|---|-------|--------|-------|
| 1 | **UTF-8 panic**: `&msg[..200]` panics on multi-byte chars | Rust Expert, Devil's Advocate | `synthesis.rs:1313` |
| 2 | **Race condition in job creation**: double-click creates duplicate jobs (DashMap check + insert not atomic) | Devil's Advocate | `synthesis.rs` (job management) |
| 3 | **XSS via `innerHTML`**: renders user-controlled `theme` unsafely | Frontend Expert | `GenerateSynthesis.tsx` |
---
## P1 — Fix Within 30 Days
| # | Issue | Source | Files |
|---|-------|--------|-------|
| 4 | **Expired sessions never cleaned up**: `sessions::delete_expired()` exists but is never called | Devil's Advocate | startup / scheduled task |
| 5 | **No pipeline timeout**: hung LLM call blocks generation slot indefinitely | Devil's Advocate | `synthesis.rs` |
| 6 | **SSRF bypass via IPv4-mapped IPv6**: `::ffff:127.0.0.1` passes all current checks | Rust Expert, Devil's Advocate | `scraper.rs` |
| 7 | **Missing SSRF check on initial source page fetch** | Rust Expert | `source_scraper.rs` |
| 8 | **Spawned generation tasks swallow panics**: SSE clients hang forever | Rust Expert | `generation handler` |
| 9 | **Missing `onCleanup` for `setTimeout`**: timers fire after component unmount | Frontend Expert | `Home.tsx`, `ArticleHistory.tsx`, `SynthesisDetail.tsx` |
---
## Structural Refactoring — High Priority
These are the main maintainability issues, confirmed by 3+ auditors.
### 1. Decompose `synthesis.rs` (1,686 lines)
**Confirmed by**: Architect, Tech Lead, Rust Expert, Devil's Advocate
The `run_generation_inner` function is ~650 lines with the scrape+classify batch loop duplicated 3 times (Phase 1, Phase 2 Brave, Phase 2 LLM). Article filtering (homepage, dedup, history, diversity) is also duplicated across paths.
**Refactoring plan**:
- Extract `process_article_batch()` — shared scrape+classify loop (~150 lines saved)
- Extract `filter_candidate_url()` — shared filtering logic
- Extract `assign_category()` — shared category assignment + overflow logic
- Consider splitting into `synthesis/pipeline.rs`, `synthesis/phases.rs`, `synthesis/helpers.rs`
**Effort**: Large | **Impact**: Critical
### 2. Eliminate `SettingsResponse` struct
**Confirmed by**: Tech Lead
`SettingsResponse` is identical to `UserSettings` minus `user_id` and `updated_at`. Every new setting requires changes to 4 Rust structs + SQL queries + frontend types. Serialize `UserSettings` directly with `#[serde(skip)]` on internal fields.
**Effort**: Small | **Impact**: High (reduces per-setting change cost)
### 3. Decompose `Settings.tsx` (1,025 lines)
**Confirmed by**: Frontend Expert, Tech Lead
Split into sub-components: `SettingsGeneral`, `SettingsProvider`, `SettingsAdvanced`, `SettingsBraveSearch`, `SettingsRateLimit`.
**Effort**: Medium | **Impact**: High
### 4. Extract shared LLM error mapping
**Confirmed by**: Tech Lead
`map_openai_error`, `map_gemini_error`, `map_anthropic_error` are near-identical. Extract to a shared `map_http_error` function.
**Effort**: Small | **Impact**: Medium
### 5. Replace `trace_article` 11-parameter function with struct
**Confirmed by**: Tech Lead, Rust Expert
```rust
struct ArticleTrace { url, title, source_type, source_url, category, synthesis_id, status, scraped_ok }
```
**Effort**: Small | **Impact**: Medium
---
## Medium Priority Improvements
| # | Issue | Source |
|---|-------|--------|
| 10 | N+1 individual INSERTs for article history — batch with `unnest()` | Rust Expert |
| 11 | 59 `.clone()` calls in pipeline — use `Arc<str>` for shared immutables | Rust Expert |
| 12 | CSS selectors re-parsed every call — use `LazyLock` | Rust Expert |
| 13 | Inconsistent data fetching in frontend (`createResource` vs `onMount` + signals) | Frontend Expert |
| 14 | Reusable `Button` component exists but unused — inline Tailwind duplicated | Frontend Expert |
| 15 | Default settings diverge between frontend and backend | Tech Lead |
| 16 | `AppConfig` carries secrets in cloneable `String` — use `Arc`-wrapped | Rust Expert, Devil's Advocate |
| 17 | In-memory rate limits reset on restart | Devil's Advocate |
| 18 | `SESSION_SECRET` env var validated but never used | Devil's Advocate |
---
## Positive Highlights
All auditors noted these strengths:
- **Zero `unsafe` code**, zero `unwrap()` in production
- **Strong security posture**: AES-256-GCM with zeroized keys, SSRF prevention (14 tests), anti-enumeration timing, parameterized SQL
- **Clean error handling**: `AppError` enum with proper HTTP mapping, internal errors never leaked
- **Well-designed `LlmProvider` trait**: clean abstraction over 3 providers
- **Comprehensive test suite**: good coverage on scraper, LLM providers, rate limiter, encryption
- **Excellent module documentation**: every module has `//!` doc comments
---
## Detailed Reports
- [Architect Report](architect-report.md) — SOLID principles, design patterns, architecture
- [Frontend Report](frontend-report.md) — SolidJS patterns, component architecture
- [Rust Expert Report](rust-expert-report.md) — Idiomatic Rust, async, safety
- [Tech Lead Report](tech-lead-report.md) — Complexity, duplication, refactoring plan
- [Devil's Advocate Report](devils-advocate-report.md) — Hidden risks, security, failure modes