You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

17 KiB

Raw Blame History

V2 Tech Lead Audit Report — AI Weekly Synth

Date: 2026-03-27 Scope: Full codebase (backend + frontend), complexity, duplication, readability, maintainability

Executive Summary

The codebase is well-structured for a learning project and demonstrates solid engineering practices: clean error handling, SSRF protection, rate limiting, encryption at rest, and thorough test coverage for utility functions. However, organic growth has introduced one critical complexity hotspot (synthesis.rs at 2010 lines), significant frontend duplication between Sources.tsx and ThemeManager.tsx, and several patterns that will impede future development if not addressed.

Priority: 14 findings ranked P1 (do first) through P4 (nice to have).

1. Complexity Hotspots

1.1 [P1] `backend/src/services/synthesis.rs` — 2010 lines, God Function

File: /Users/oabrivard/Projects/rust/ai_synth/backend/src/services/synthesis.rs

run_generation_inner() spans approximately 800 lines (lines 246-1038). It handles initialization, source rotation, link extraction, article history filtering, preferred-source shuffling, batch scraping, LLM classification, date filtering, category assignment, Brave Search fallback, LLM web search fallback, final assembly, and database persistence — all in a single function.

Specific issues:

Deep nesting: The wave loop ('wave_loop) contains a batch loop (while !done), which contains a JoinSet collection loop, which contains match arms with multiple continue branches. This is 4-5 levels of nesting.
Duplicated scrape+classify logic: The Phase 1 scrape+classify block (lines 471-632) and the Brave Search scrape+classify block (lines 704-888) are near-identical. Both build a JoinSet, spawn scrape tasks, collect results, build another JoinSet for LLM classification, parse responses, check is_article, filter by date, handle no-date articles, and assign categories.
12 calls to build_trace_entry() with the same boilerplate ArticleTrace struct construction scattered throughout.
7 flush-pending-traces blocks (check !pending_traces.is_empty(), call batch_insert_entries, call pending_traces.clear()).

Recommendation: Extract into a pipeline module with distinct phases:

services/pipeline/mod.rs        — orchestrator (run_generation_inner)
services/pipeline/phase1.rs     — personalized source processing
services/pipeline/phase2.rs     — web search fallback (Brave + LLM)
services/pipeline/classify.rs   — shared scrape+classify batch logic
services/pipeline/tracing.rs    — ArticleTrace builder + flush helper
services/pipeline/progress.rs   — ProgressEvent + emit_progress

1.2 [P2] `backend/src/services/scraper.rs` — 1280 lines

File: /Users/oabrivard/Projects/rust/ai_synth/backend/src/services/scraper.rs

This file is reasonably well-organized but large. The 600+ lines of tests (starting at line 678) constitute nearly half the file. The SSRF validation, HTML parsing, date extraction, and soft-404 detection are logically distinct concerns.

Recommendation: Move tests to backend/src/services/scraper/tests.rs using a #[cfg(test)] mod tests; pattern. Consider splitting the file into scraper/ssrf.rs, scraper/html.rs, scraper/dates.rs if it continues to grow.

1.3 [P3] `frontend/src/pages/ThemeManager.tsx` — 935 lines, monolithic component

File: /Users/oabrivard/Projects/rust/ai_synth/frontend/src/pages/ThemeManager.tsx

This single component manages 20+ signals, handles theme CRUD, source CRUD, bulk import, CSV import/export, preferred sources, category editing, and a schedule sub-component. The render function alone (lines 429-931) is 500 lines of JSX.

Recommendation: Extract sub-components:

ThemeContentForm — name, topic, categories, max age/items, summary length
ThemeSourceList — source list, add, delete, preferred toggle
ThemeImportExport — CSV and bulk import sections

2. Code Duplication

2.1 [P1] Sources.tsx and ThemeManager.tsx — ~80% duplicated source management logic

Files:

/Users/oabrivard/Projects/rust/ai_synth/frontend/src/pages/Sources.tsx (481 lines)
/Users/oabrivard/Projects/rust/ai_synth/frontend/src/pages/ThemeManager.tsx (935 lines)

Nearly every source-management function in ThemeManager.tsx is a copy-paste of Sources.tsx with minor adaptations (adding theme_id parameter):

handleAddSource — identical validation logic, same error handling pattern
handleDeleteClick / performDelete — identical two-click confirmation with timer
handleExportCsv / handleImportCsv — identical
handleBulkImport — identical line parsing, same semicolon splitting

The JSX for source list rendering (star toggle, delete button, link display) is also duplicated.

Recommendation: Extract a SourceManager component that accepts an optional themeId prop. Both pages delegate to it. The normalizeUrl and isValidUrl functions are already exported from Sources.tsx and imported by ThemeManager.tsx — this pattern should extend to the full source management UI.

2.2 [P1] Synthesis pipeline: duplicated scrape+classify blocks

As noted in 1.1, the Phase 1 and Brave Search paths in synthesis.rs duplicate approximately 120 lines of scrape-then-classify logic. The only differences are:

Phase 1 tracks source_url per article; Brave does not
Phase 1 uses (String, String, String, String) tuples; Brave uses (String, String, String)

Recommendation: Create a scrape_and_classify_batch() function parameterized by source type and optional source URL. This eliminates the duplication and makes adding future search backends (e.g., Google Search, Bing) trivial.

2.3 [P2] Frontend error handling boilerplate — 40+ occurrences

The pattern catch (err) { if (isApiError(err)) { setX(err.message) } else { setX(t('...')) } } appears 40 times across 14 files. This is mechanical and could be simplified.

Recommendation: Create a handleApiError(err, fallbackKey) utility:

function handleApiError(err: unknown, t: TFunction, fallbackKey: string): string {
  return isApiError(err) ? err.message : t(fallbackKey);
}

2.4 [P3] Admin audit logging boilerplate

In admin.rs, every handler follows the same pattern: perform action, then call db::audit::create_entry with a CreateAuditLog struct. This is 5 occurrences, each ~15 lines.

Recommendation: Consider an audit middleware or macro that captures the action, target type, and details from the handler return value.

3. Readability

3.1 [P2] French/English mixing in backend code

User-facing strings in synthesis.rs and prompts.rs are hardcoded in French:

Progress messages: "Chargement des parametres...", "Analyse des sources personnalisees..."
Error messages: "Aucun article valide trouve. Verifiez vos sources et categories."
Prompt text: entire system/user prompts in French

Meanwhile, code comments, doc strings, log messages, and error variants are in English. This inconsistency makes it harder for non-French speakers to contribute and prevents future i18n.

Recommendation: Move all user-facing strings to constants or a backend i18n module. Keep code, comments, and logs in English.

3.2 [P3] `#[allow(clippy::too_many_arguments)]` — 3 occurrences

Files: synthesis.rs, prompts.rs, llm_call_log.rs

These suppressions indicate functions with parameter counts exceeding Clippy's threshold (typically 7+). They are code smells signaling that parameters should be grouped into structs.

build_search_prompt takes 9 parameters
log_llm_call takes 10 parameters
insert in llm_call_log.rs takes 10 parameters

Recommendation: Introduce parameter structs:

struct SearchPromptParams<'a> {
    theme: &'a str,
    categories: &'a [String],
    max_items_per_category: i32,
    // ...
}

3.3 [P4] Magic strings for category keys

Category keys like "category_0", "category_autre", "category_no_date" are used as HashMap keys throughout synthesis.rs and in schema.rs. These appear as raw string literals in ~15 places.

Recommendation: Define constants or an enum:

const CATEGORY_OTHER: &str = "category_autre";
const CATEGORY_NO_DATE: &str = "category_no_date";
fn category_key(index: usize) -> String { format!("category_{}", index) }

4. Maintainability Risks

4.1 [P2] Tight coupling between synthesis pipeline and database

run_generation_inner() directly calls db::settings::get_or_create_default, db::themes::get_by_id, db::sources::list_for_user, db::article_history::*, db::llm_call_log::insert, db::syntheses::create, and a raw sqlx::query_scalar (line 1419-1429 for resolve_model). The function takes AppState which bundles the database pool, HTTP client, job store, and rate limiters.

Impact: Unit testing the pipeline logic requires either a real Postgres database or a complete mock of AppState. The existing E2E tests use a mock LLM provider (good) but still need Postgres (expensive).

Recommendation: Introduce a PipelineContext trait or struct that abstracts data access. This would allow testing the orchestration logic with in-memory implementations.

4.2 [P2] Raw SQL inline in `resolve_model()`

File: /Users/oabrivard/Projects/rust/ai_synth/backend/src/services/synthesis.rs, lines 1419-1429

let model = sqlx::query_scalar::<_, String>(
    r#"SELECT m->>'model_id' FROM admin_providers, ..."#,
)

This is the only place in the service layer that contains raw SQL. All other queries go through the db/ module, maintaining a clean separation. This breaks the pattern.

Recommendation: Move to db::providers::get_default_scraping_model(pool, provider_name).

Each of the three providers (gemini.rs, openai.rs, anthropic.rs) implements the same pattern:

Build request body (provider-specific)
Send HTTP request
Map network errors with is_timeout() / is_connect() classification
Parse response JSON
Check HTTP status and map errors
Extract content from provider-specific response structure

Steps 2-4 are identical across all three providers (~20 lines each). Only steps 1, 5, and 6 differ.

Recommendation: Extract a send_llm_request() helper in llm/mod.rs:

async fn send_llm_request(
    client: &reqwest::Client,
    url: &str,
    body: &Value,
    headers: &[(String, String)],
    provider_name: &str,
) -> Result<(u16, Value), AppError>

4.4 [P3] `Providers.tsx` — 854 lines, complex inline state management

File: /Users/oabrivard/Projects/rust/ai_synth/frontend/src/pages/admin/Providers.tsx

The admin Providers page manages local editable copies of provider state in a Record<string, ProviderFormState> map, with functions for model array manipulation (add, remove, toggle default), scraping vs. websearch model lists, and inline validation. This is the most complex admin page and would benefit from splitting the model list editor into a reusable ModelListEditor component.

4.5 [P4] `Settings.tsx` — 694 lines, growing form complexity

File: /Users/oabrivard/Projects/rust/ai_synth/frontend/src/pages/Settings.tsx

The settings page has already been partially decomposed (SettingsBraveSearch, SettingsRateLimit, ApiKeyManager), which is good. The remaining monolithic JSX sections (provider selection, model dropdowns, import/export) could follow the same pattern for consistency.

5. Simplification Opportunities

5.1 [P3] `Sources.tsx` may be dead code

With the introduction of ThemeManager.tsx, which subsumes all source management under themes, the standalone Sources.tsx page may no longer be reachable by users. It is still registered in the router, but if all sources must now belong to a theme, the standalone page serves no purpose.

Action: Verify whether Sources.tsx is still linked in the navigation. If not, remove it and its route to eliminate 481 lines of duplicated code.

5.2 [P3] `list_for_user` query branch duplication

File: /Users/oabrivard/Projects/rust/ai_synth/backend/src/db/sources.rs, lines 15-44

The function has two nearly identical SQL queries — one with AND theme_id = $2 and one without. The only difference is the optional WHERE clause.

Recommendation: Use a single query with a conditional clause:

sqlx::query_as::<_, Source>(
    "SELECT ... FROM sources WHERE user_id = $1 AND ($2::uuid IS NULL OR theme_id = $2) ORDER BY ..."
)
.bind(user_id)
.bind(theme_id)

5.3 [P4] `bulk_create` uses sequential inserts instead of batch

File: /Users/oabrivard/Projects/rust/ai_synth/backend/src/db/sources.rs, lines 97-127

Sources are inserted one by one in a loop. For bulk imports of 50-100 sources, this generates 50-100 round-trips to the database.

Recommendation: Use sqlx's batch insert or build a single INSERT ... VALUES ($1, $2), ($3, $4), ... query. This is a performance optimization, not a correctness issue.

5.4 [P4] Hardcoded snippet sizes

In synthesis.rs, the snippet size is computed from summary_length in two separate places (Phase 1 at line 437 and Brave at lines 766-770):

let snippet_size = match theme.summary_length { 1 => 500, 2 => 2000, _ => 4000 };

Recommendation: Extract to a function fn snippet_size_for_length(summary_length: i32) -> usize.

6. File Size Summary

Backend (top 10 by line count)

File	Lines	Assessment
`services/synthesis.rs`	2010	Needs decomposition (P1)
`services/scraper.rs`	1280	Acceptable, extract tests
`services/rate_limiter.rs`	471	Clean
`services/llm/anthropic.rs`	471	Minor shared-code opportunity
`services/export.rs`	459	Clean
`handlers/admin.rs`	438	Audit boilerplate
`models/synthesis.rs`	416	Clean
`services/email.rs`	384	Clean
`handlers/auth.rs`	381	Clean
`services/llm/openai.rs`	373	Minor shared-code opportunity

Frontend (top 10 by line count)

File	Lines	Assessment
`pages/ThemeManager.tsx`	935	Needs decomposition (P1/P3)
`pages/admin/Providers.tsx`	854	Extract ModelListEditor (P3)
`pages/Settings.tsx`	694	Partially decomposed, continue (P4)
`pages/SynthesisDetail.tsx`	548	Acceptable
`pages/Sources.tsx`	481	Possibly dead code (P3)
`pages/GenerateSynthesis.tsx`	471	Clean
`i18n/fr.ts`	462	Expected size for translations
`pages/ArticleHistory.tsx`	371	Clean
`pages/Home.tsx`	345	Clean
`components/settings/SettingsSchedule.tsx`	286	Clean

7. Positive Observations

These aspects of the codebase are well-executed and should be preserved:

Error handling: AppError enum with IntoResponse is clean, consistent, and hides internal details. Tests verify that secrets are never leaked.
Security: SSRF prevention with DNS resolution checks, AES-256-GCM encryption for API keys, CSRF via X-Requested-With, timing-attack mitigation in auth, and sensitive data scrubbing in error messages.
LLM provider abstraction: The LlmProvider trait + factory pattern makes adding new providers straightforward.
Documentation: Module-level //! doc comments on every file, function-level /// doc comments with examples, and clear CLAUDE.md project instructions.
Frontend component extraction: SettingsBraveSearch, SettingsRateLimit, SettingsSchedule, and ApiKeyManager demonstrate good instincts for decomposition.
Type safety: Frontend types.ts is clean, well-organized, and provides isApiError type guard.
Test coverage: Unit tests for error handling, SSRF checks, URL normalization, job store, role validation, and CSV parsing.

8. Prioritized Action Plan

Priority	Item	Effort	Impact
P1	Decompose `synthesis.rs` into pipeline module (1.1)	Large	Reduces complexity, enables testing
P1	Extract shared `SourceManager` component (2.1)	Medium	Eliminates ~300 lines of duplication
P1	Extract shared scrape+classify function (2.2)	Medium	Eliminates ~120 lines of duplication
P2	Move hardcoded French strings to constants (3.1)	Medium	Enables future i18n, improves consistency
P2	Frontend error-handling helper (2.3)	Small	Reduces boilerplate in 14 files
P2	Abstract data access from pipeline (4.1)	Large	Enables unit testing without Postgres
P2	Move inline SQL from `resolve_model` to db module (4.2)	Small	Maintains architecture consistency
P2	Extract scraper tests to separate file (1.2)	Small	Improves file navigation
P3	Decompose `ThemeManager.tsx` into sub-components (1.3)	Medium	Improves readability
P3	Introduce parameter structs for long signatures (3.2)	Small	Removes clippy suppressions
P3	Define category key constants (3.3)	Small	Prevents typo bugs
P3	Audit whether `Sources.tsx` is dead code (5.1)	Small	Potential -481 lines
P3	Consolidate LLM HTTP request handling (4.3)	Medium	Reduces duplication across 3 files
P4	Batch insert for `bulk_create` (5.3)	Small	Performance improvement

17 KiB Raw Blame History