17 KiB
V2 Tech Lead Audit Report — AI Weekly Synth
Date: 2026-03-27 Scope: Full codebase (backend + frontend), complexity, duplication, readability, maintainability
Executive Summary
The codebase is well-structured for a learning project and demonstrates solid engineering practices: clean error handling, SSRF protection, rate limiting, encryption at rest, and thorough test coverage for utility functions. However, organic growth has introduced one critical complexity hotspot (synthesis.rs at 2010 lines), significant frontend duplication between Sources.tsx and ThemeManager.tsx, and several patterns that will impede future development if not addressed.
Priority: 14 findings ranked P1 (do first) through P4 (nice to have).
1. Complexity Hotspots
1.1 [P1] backend/src/services/synthesis.rs — 2010 lines, God Function
File: /Users/oabrivard/Projects/rust/ai_synth/backend/src/services/synthesis.rs
run_generation_inner() spans approximately 800 lines (lines 246-1038). It handles initialization, source rotation, link extraction, article history filtering, preferred-source shuffling, batch scraping, LLM classification, date filtering, category assignment, Brave Search fallback, LLM web search fallback, final assembly, and database persistence — all in a single function.
Specific issues:
- Deep nesting: The wave loop (
'wave_loop) contains a batch loop (while !done), which contains a JoinSet collection loop, which contains match arms with multiplecontinuebranches. This is 4-5 levels of nesting. - Duplicated scrape+classify logic: The Phase 1 scrape+classify block (lines 471-632) and the Brave Search scrape+classify block (lines 704-888) are near-identical. Both build a JoinSet, spawn scrape tasks, collect results, build another JoinSet for LLM classification, parse responses, check
is_article, filter by date, handle no-date articles, and assign categories. - 12 calls to
build_trace_entry()with the same boilerplateArticleTracestruct construction scattered throughout. - 7 flush-pending-traces blocks (check
!pending_traces.is_empty(), callbatch_insert_entries, callpending_traces.clear()).
Recommendation: Extract into a pipeline module with distinct phases:
services/pipeline/mod.rs — orchestrator (run_generation_inner)
services/pipeline/phase1.rs — personalized source processing
services/pipeline/phase2.rs — web search fallback (Brave + LLM)
services/pipeline/classify.rs — shared scrape+classify batch logic
services/pipeline/tracing.rs — ArticleTrace builder + flush helper
services/pipeline/progress.rs — ProgressEvent + emit_progress
1.2 [P2] backend/src/services/scraper.rs — 1280 lines
File: /Users/oabrivard/Projects/rust/ai_synth/backend/src/services/scraper.rs
This file is reasonably well-organized but large. The 600+ lines of tests (starting at line 678) constitute nearly half the file. The SSRF validation, HTML parsing, date extraction, and soft-404 detection are logically distinct concerns.
Recommendation: Move tests to backend/src/services/scraper/tests.rs using a #[cfg(test)] mod tests; pattern. Consider splitting the file into scraper/ssrf.rs, scraper/html.rs, scraper/dates.rs if it continues to grow.
1.3 [P3] frontend/src/pages/ThemeManager.tsx — 935 lines, monolithic component
File: /Users/oabrivard/Projects/rust/ai_synth/frontend/src/pages/ThemeManager.tsx
This single component manages 20+ signals, handles theme CRUD, source CRUD, bulk import, CSV import/export, preferred sources, category editing, and a schedule sub-component. The render function alone (lines 429-931) is 500 lines of JSX.
Recommendation: Extract sub-components:
ThemeContentForm— name, topic, categories, max age/items, summary lengthThemeSourceList— source list, add, delete, preferred toggleThemeImportExport— CSV and bulk import sections
2. Code Duplication
2.1 [P1] Sources.tsx and ThemeManager.tsx — ~80% duplicated source management logic
Files:
/Users/oabrivard/Projects/rust/ai_synth/frontend/src/pages/Sources.tsx(481 lines)/Users/oabrivard/Projects/rust/ai_synth/frontend/src/pages/ThemeManager.tsx(935 lines)
Nearly every source-management function in ThemeManager.tsx is a copy-paste of Sources.tsx with minor adaptations (adding theme_id parameter):
handleAddSource— identical validation logic, same error handling patternhandleDeleteClick/performDelete— identical two-click confirmation with timerhandleExportCsv/handleImportCsv— identicalhandleBulkImport— identical line parsing, same semicolon splitting
The JSX for source list rendering (star toggle, delete button, link display) is also duplicated.
Recommendation: Extract a SourceManager component that accepts an optional themeId prop. Both pages delegate to it. The normalizeUrl and isValidUrl functions are already exported from Sources.tsx and imported by ThemeManager.tsx — this pattern should extend to the full source management UI.
2.2 [P1] Synthesis pipeline: duplicated scrape+classify blocks
As noted in 1.1, the Phase 1 and Brave Search paths in synthesis.rs duplicate approximately 120 lines of scrape-then-classify logic. The only differences are:
- Phase 1 tracks
source_urlper article; Brave does not - Phase 1 uses
(String, String, String, String)tuples; Brave uses(String, String, String)
Recommendation: Create a scrape_and_classify_batch() function parameterized by source type and optional source URL. This eliminates the duplication and makes adding future search backends (e.g., Google Search, Bing) trivial.
2.3 [P2] Frontend error handling boilerplate — 40+ occurrences
The pattern catch (err) { if (isApiError(err)) { setX(err.message) } else { setX(t('...')) } } appears 40 times across 14 files. This is mechanical and could be simplified.
Recommendation: Create a handleApiError(err, fallbackKey) utility:
function handleApiError(err: unknown, t: TFunction, fallbackKey: string): string {
return isApiError(err) ? err.message : t(fallbackKey);
}
2.4 [P3] Admin audit logging boilerplate
In admin.rs, every handler follows the same pattern: perform action, then call db::audit::create_entry with a CreateAuditLog struct. This is 5 occurrences, each ~15 lines.
Recommendation: Consider an audit middleware or macro that captures the action, target type, and details from the handler return value.
3. Readability
3.1 [P2] French/English mixing in backend code
User-facing strings in synthesis.rs and prompts.rs are hardcoded in French:
- Progress messages:
"Chargement des parametres...","Analyse des sources personnalisees..." - Error messages:
"Aucun article valide trouve. Verifiez vos sources et categories." - Prompt text: entire system/user prompts in French
Meanwhile, code comments, doc strings, log messages, and error variants are in English. This inconsistency makes it harder for non-French speakers to contribute and prevents future i18n.
Recommendation: Move all user-facing strings to constants or a backend i18n module. Keep code, comments, and logs in English.
3.2 [P3] #[allow(clippy::too_many_arguments)] — 3 occurrences
Files: synthesis.rs, prompts.rs, llm_call_log.rs
These suppressions indicate functions with parameter counts exceeding Clippy's threshold (typically 7+). They are code smells signaling that parameters should be grouped into structs.
build_search_prompttakes 9 parameterslog_llm_calltakes 10 parametersinsertinllm_call_log.rstakes 10 parameters
Recommendation: Introduce parameter structs:
struct SearchPromptParams<'a> {
theme: &'a str,
categories: &'a [String],
max_items_per_category: i32,
// ...
}
3.3 [P4] Magic strings for category keys
Category keys like "category_0", "category_autre", "category_no_date" are used as HashMap keys throughout synthesis.rs and in schema.rs. These appear as raw string literals in ~15 places.
Recommendation: Define constants or an enum:
const CATEGORY_OTHER: &str = "category_autre";
const CATEGORY_NO_DATE: &str = "category_no_date";
fn category_key(index: usize) -> String { format!("category_{}", index) }
4. Maintainability Risks
4.1 [P2] Tight coupling between synthesis pipeline and database
run_generation_inner() directly calls db::settings::get_or_create_default, db::themes::get_by_id, db::sources::list_for_user, db::article_history::*, db::llm_call_log::insert, db::syntheses::create, and a raw sqlx::query_scalar (line 1419-1429 for resolve_model). The function takes AppState which bundles the database pool, HTTP client, job store, and rate limiters.
Impact: Unit testing the pipeline logic requires either a real Postgres database or a complete mock of AppState. The existing E2E tests use a mock LLM provider (good) but still need Postgres (expensive).
Recommendation: Introduce a PipelineContext trait or struct that abstracts data access. This would allow testing the orchestration logic with in-memory implementations.
4.2 [P2] Raw SQL inline in resolve_model()
File: /Users/oabrivard/Projects/rust/ai_synth/backend/src/services/synthesis.rs, lines 1419-1429
let model = sqlx::query_scalar::<_, String>(
r#"SELECT m->>'model_id' FROM admin_providers, ..."#,
)
This is the only place in the service layer that contains raw SQL. All other queries go through the db/ module, maintaining a clean separation. This breaks the pattern.
Recommendation: Move to db::providers::get_default_scraping_model(pool, provider_name).
4.3 [P3] LLM provider implementations share identical HTTP error handling
Each of the three providers (gemini.rs, openai.rs, anthropic.rs) implements the same pattern:
- Build request body (provider-specific)
- Send HTTP request
- Map network errors with
is_timeout()/is_connect()classification - Parse response JSON
- Check HTTP status and map errors
- Extract content from provider-specific response structure
Steps 2-4 are identical across all three providers (~20 lines each). Only steps 1, 5, and 6 differ.
Recommendation: Extract a send_llm_request() helper in llm/mod.rs:
async fn send_llm_request(
client: &reqwest::Client,
url: &str,
body: &Value,
headers: &[(String, String)],
provider_name: &str,
) -> Result<(u16, Value), AppError>
4.4 [P3] Providers.tsx — 854 lines, complex inline state management
File: /Users/oabrivard/Projects/rust/ai_synth/frontend/src/pages/admin/Providers.tsx
The admin Providers page manages local editable copies of provider state in a Record<string, ProviderFormState> map, with functions for model array manipulation (add, remove, toggle default), scraping vs. websearch model lists, and inline validation. This is the most complex admin page and would benefit from splitting the model list editor into a reusable ModelListEditor component.
4.5 [P4] Settings.tsx — 694 lines, growing form complexity
File: /Users/oabrivard/Projects/rust/ai_synth/frontend/src/pages/Settings.tsx
The settings page has already been partially decomposed (SettingsBraveSearch, SettingsRateLimit, ApiKeyManager), which is good. The remaining monolithic JSX sections (provider selection, model dropdowns, import/export) could follow the same pattern for consistency.
5. Simplification Opportunities
5.1 [P3] Sources.tsx may be dead code
With the introduction of ThemeManager.tsx, which subsumes all source management under themes, the standalone Sources.tsx page may no longer be reachable by users. It is still registered in the router, but if all sources must now belong to a theme, the standalone page serves no purpose.
Action: Verify whether Sources.tsx is still linked in the navigation. If not, remove it and its route to eliminate 481 lines of duplicated code.
5.2 [P3] list_for_user query branch duplication
File: /Users/oabrivard/Projects/rust/ai_synth/backend/src/db/sources.rs, lines 15-44
The function has two nearly identical SQL queries — one with AND theme_id = $2 and one without. The only difference is the optional WHERE clause.
Recommendation: Use a single query with a conditional clause:
sqlx::query_as::<_, Source>(
"SELECT ... FROM sources WHERE user_id = $1 AND ($2::uuid IS NULL OR theme_id = $2) ORDER BY ..."
)
.bind(user_id)
.bind(theme_id)
5.3 [P4] bulk_create uses sequential inserts instead of batch
File: /Users/oabrivard/Projects/rust/ai_synth/backend/src/db/sources.rs, lines 97-127
Sources are inserted one by one in a loop. For bulk imports of 50-100 sources, this generates 50-100 round-trips to the database.
Recommendation: Use sqlx's batch insert or build a single INSERT ... VALUES ($1, $2), ($3, $4), ... query. This is a performance optimization, not a correctness issue.
5.4 [P4] Hardcoded snippet sizes
In synthesis.rs, the snippet size is computed from summary_length in two separate places (Phase 1 at line 437 and Brave at lines 766-770):
let snippet_size = match theme.summary_length { 1 => 500, 2 => 2000, _ => 4000 };
Recommendation: Extract to a function fn snippet_size_for_length(summary_length: i32) -> usize.
6. File Size Summary
Backend (top 10 by line count)
| File | Lines | Assessment |
|---|---|---|
services/synthesis.rs |
2010 | Needs decomposition (P1) |
services/scraper.rs |
1280 | Acceptable, extract tests |
services/rate_limiter.rs |
471 | Clean |
services/llm/anthropic.rs |
471 | Minor shared-code opportunity |
services/export.rs |
459 | Clean |
handlers/admin.rs |
438 | Audit boilerplate |
models/synthesis.rs |
416 | Clean |
services/email.rs |
384 | Clean |
handlers/auth.rs |
381 | Clean |
services/llm/openai.rs |
373 | Minor shared-code opportunity |
Frontend (top 10 by line count)
| File | Lines | Assessment |
|---|---|---|
pages/ThemeManager.tsx |
935 | Needs decomposition (P1/P3) |
pages/admin/Providers.tsx |
854 | Extract ModelListEditor (P3) |
pages/Settings.tsx |
694 | Partially decomposed, continue (P4) |
pages/SynthesisDetail.tsx |
548 | Acceptable |
pages/Sources.tsx |
481 | Possibly dead code (P3) |
pages/GenerateSynthesis.tsx |
471 | Clean |
i18n/fr.ts |
462 | Expected size for translations |
pages/ArticleHistory.tsx |
371 | Clean |
pages/Home.tsx |
345 | Clean |
components/settings/SettingsSchedule.tsx |
286 | Clean |
7. Positive Observations
These aspects of the codebase are well-executed and should be preserved:
- Error handling:
AppErrorenum withIntoResponseis clean, consistent, and hides internal details. Tests verify that secrets are never leaked. - Security: SSRF prevention with DNS resolution checks, AES-256-GCM encryption for API keys, CSRF via
X-Requested-With, timing-attack mitigation in auth, and sensitive data scrubbing in error messages. - LLM provider abstraction: The
LlmProvidertrait + factory pattern makes adding new providers straightforward. - Documentation: Module-level
//!doc comments on every file, function-level///doc comments with examples, and clear CLAUDE.md project instructions. - Frontend component extraction:
SettingsBraveSearch,SettingsRateLimit,SettingsSchedule, andApiKeyManagerdemonstrate good instincts for decomposition. - Type safety: Frontend
types.tsis clean, well-organized, and providesisApiErrortype guard. - Test coverage: Unit tests for error handling, SSRF checks, URL normalization, job store, role validation, and CSV parsing.
8. Prioritized Action Plan
| Priority | Item | Effort | Impact |
|---|---|---|---|
| P1 | Decompose synthesis.rs into pipeline module (1.1) |
Large | Reduces complexity, enables testing |
| P1 | Extract shared SourceManager component (2.1) |
Medium | Eliminates ~300 lines of duplication |
| P1 | Extract shared scrape+classify function (2.2) | Medium | Eliminates ~120 lines of duplication |
| P2 | Move hardcoded French strings to constants (3.1) | Medium | Enables future i18n, improves consistency |
| P2 | Frontend error-handling helper (2.3) | Small | Reduces boilerplate in 14 files |
| P2 | Abstract data access from pipeline (4.1) | Large | Enables unit testing without Postgres |
| P2 | Move inline SQL from resolve_model to db module (4.2) |
Small | Maintains architecture consistency |
| P2 | Extract scraper tests to separate file (1.2) | Small | Improves file navigation |
| P3 | Decompose ThemeManager.tsx into sub-components (1.3) |
Medium | Improves readability |
| P3 | Introduce parameter structs for long signatures (3.2) | Small | Removes clippy suppressions |
| P3 | Define category key constants (3.3) | Small | Prevents typo bugs |
| P3 | Audit whether Sources.tsx is dead code (5.1) |
Small | Potential -481 lines |
| P3 | Consolidate LLM HTTP request handling (4.3) | Medium | Reduces duplication across 3 files |
| P4 | Batch insert for bulk_create (5.3) |
Small | Performance improvement |