You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

17 KiB

V2 Tech Lead Audit Report — AI Weekly Synth

Date: 2026-03-27 Scope: Full codebase (backend + frontend), complexity, duplication, readability, maintainability


Executive Summary

The codebase is well-structured for a learning project and demonstrates solid engineering practices: clean error handling, SSRF protection, rate limiting, encryption at rest, and thorough test coverage for utility functions. However, organic growth has introduced one critical complexity hotspot (synthesis.rs at 2010 lines), significant frontend duplication between Sources.tsx and ThemeManager.tsx, and several patterns that will impede future development if not addressed.

Priority: 14 findings ranked P1 (do first) through P4 (nice to have).


1. Complexity Hotspots

1.1 [P1] backend/src/services/synthesis.rs — 2010 lines, God Function

File: /Users/oabrivard/Projects/rust/ai_synth/backend/src/services/synthesis.rs

run_generation_inner() spans approximately 800 lines (lines 246-1038). It handles initialization, source rotation, link extraction, article history filtering, preferred-source shuffling, batch scraping, LLM classification, date filtering, category assignment, Brave Search fallback, LLM web search fallback, final assembly, and database persistence — all in a single function.

Specific issues:

  • Deep nesting: The wave loop ('wave_loop) contains a batch loop (while !done), which contains a JoinSet collection loop, which contains match arms with multiple continue branches. This is 4-5 levels of nesting.
  • Duplicated scrape+classify logic: The Phase 1 scrape+classify block (lines 471-632) and the Brave Search scrape+classify block (lines 704-888) are near-identical. Both build a JoinSet, spawn scrape tasks, collect results, build another JoinSet for LLM classification, parse responses, check is_article, filter by date, handle no-date articles, and assign categories.
  • 12 calls to build_trace_entry() with the same boilerplate ArticleTrace struct construction scattered throughout.
  • 7 flush-pending-traces blocks (check !pending_traces.is_empty(), call batch_insert_entries, call pending_traces.clear()).

Recommendation: Extract into a pipeline module with distinct phases:

services/pipeline/mod.rs        — orchestrator (run_generation_inner)
services/pipeline/phase1.rs     — personalized source processing
services/pipeline/phase2.rs     — web search fallback (Brave + LLM)
services/pipeline/classify.rs   — shared scrape+classify batch logic
services/pipeline/tracing.rs    — ArticleTrace builder + flush helper
services/pipeline/progress.rs   — ProgressEvent + emit_progress

1.2 [P2] backend/src/services/scraper.rs — 1280 lines

File: /Users/oabrivard/Projects/rust/ai_synth/backend/src/services/scraper.rs

This file is reasonably well-organized but large. The 600+ lines of tests (starting at line 678) constitute nearly half the file. The SSRF validation, HTML parsing, date extraction, and soft-404 detection are logically distinct concerns.

Recommendation: Move tests to backend/src/services/scraper/tests.rs using a #[cfg(test)] mod tests; pattern. Consider splitting the file into scraper/ssrf.rs, scraper/html.rs, scraper/dates.rs if it continues to grow.

1.3 [P3] frontend/src/pages/ThemeManager.tsx — 935 lines, monolithic component

File: /Users/oabrivard/Projects/rust/ai_synth/frontend/src/pages/ThemeManager.tsx

This single component manages 20+ signals, handles theme CRUD, source CRUD, bulk import, CSV import/export, preferred sources, category editing, and a schedule sub-component. The render function alone (lines 429-931) is 500 lines of JSX.

Recommendation: Extract sub-components:

  • ThemeContentForm — name, topic, categories, max age/items, summary length
  • ThemeSourceList — source list, add, delete, preferred toggle
  • ThemeImportExport — CSV and bulk import sections

2. Code Duplication

2.1 [P1] Sources.tsx and ThemeManager.tsx — ~80% duplicated source management logic

Files:

  • /Users/oabrivard/Projects/rust/ai_synth/frontend/src/pages/Sources.tsx (481 lines)
  • /Users/oabrivard/Projects/rust/ai_synth/frontend/src/pages/ThemeManager.tsx (935 lines)

Nearly every source-management function in ThemeManager.tsx is a copy-paste of Sources.tsx with minor adaptations (adding theme_id parameter):

  • handleAddSource — identical validation logic, same error handling pattern
  • handleDeleteClick / performDelete — identical two-click confirmation with timer
  • handleExportCsv / handleImportCsv — identical
  • handleBulkImport — identical line parsing, same semicolon splitting

The JSX for source list rendering (star toggle, delete button, link display) is also duplicated.

Recommendation: Extract a SourceManager component that accepts an optional themeId prop. Both pages delegate to it. The normalizeUrl and isValidUrl functions are already exported from Sources.tsx and imported by ThemeManager.tsx — this pattern should extend to the full source management UI.

2.2 [P1] Synthesis pipeline: duplicated scrape+classify blocks

As noted in 1.1, the Phase 1 and Brave Search paths in synthesis.rs duplicate approximately 120 lines of scrape-then-classify logic. The only differences are:

  • Phase 1 tracks source_url per article; Brave does not
  • Phase 1 uses (String, String, String, String) tuples; Brave uses (String, String, String)

Recommendation: Create a scrape_and_classify_batch() function parameterized by source type and optional source URL. This eliminates the duplication and makes adding future search backends (e.g., Google Search, Bing) trivial.

2.3 [P2] Frontend error handling boilerplate — 40+ occurrences

The pattern catch (err) { if (isApiError(err)) { setX(err.message) } else { setX(t('...')) } } appears 40 times across 14 files. This is mechanical and could be simplified.

Recommendation: Create a handleApiError(err, fallbackKey) utility:

function handleApiError(err: unknown, t: TFunction, fallbackKey: string): string {
  return isApiError(err) ? err.message : t(fallbackKey);
}

2.4 [P3] Admin audit logging boilerplate

In admin.rs, every handler follows the same pattern: perform action, then call db::audit::create_entry with a CreateAuditLog struct. This is 5 occurrences, each ~15 lines.

Recommendation: Consider an audit middleware or macro that captures the action, target type, and details from the handler return value.


3. Readability

3.1 [P2] French/English mixing in backend code

User-facing strings in synthesis.rs and prompts.rs are hardcoded in French:

  • Progress messages: "Chargement des parametres...", "Analyse des sources personnalisees..."
  • Error messages: "Aucun article valide trouve. Verifiez vos sources et categories."
  • Prompt text: entire system/user prompts in French

Meanwhile, code comments, doc strings, log messages, and error variants are in English. This inconsistency makes it harder for non-French speakers to contribute and prevents future i18n.

Recommendation: Move all user-facing strings to constants or a backend i18n module. Keep code, comments, and logs in English.

3.2 [P3] #[allow(clippy::too_many_arguments)] — 3 occurrences

Files: synthesis.rs, prompts.rs, llm_call_log.rs

These suppressions indicate functions with parameter counts exceeding Clippy's threshold (typically 7+). They are code smells signaling that parameters should be grouped into structs.

  • build_search_prompt takes 9 parameters
  • log_llm_call takes 10 parameters
  • insert in llm_call_log.rs takes 10 parameters

Recommendation: Introduce parameter structs:

struct SearchPromptParams<'a> {
    theme: &'a str,
    categories: &'a [String],
    max_items_per_category: i32,
    // ...
}

3.3 [P4] Magic strings for category keys

Category keys like "category_0", "category_autre", "category_no_date" are used as HashMap keys throughout synthesis.rs and in schema.rs. These appear as raw string literals in ~15 places.

Recommendation: Define constants or an enum:

const CATEGORY_OTHER: &str = "category_autre";
const CATEGORY_NO_DATE: &str = "category_no_date";
fn category_key(index: usize) -> String { format!("category_{}", index) }

4. Maintainability Risks

4.1 [P2] Tight coupling between synthesis pipeline and database

run_generation_inner() directly calls db::settings::get_or_create_default, db::themes::get_by_id, db::sources::list_for_user, db::article_history::*, db::llm_call_log::insert, db::syntheses::create, and a raw sqlx::query_scalar (line 1419-1429 for resolve_model). The function takes AppState which bundles the database pool, HTTP client, job store, and rate limiters.

Impact: Unit testing the pipeline logic requires either a real Postgres database or a complete mock of AppState. The existing E2E tests use a mock LLM provider (good) but still need Postgres (expensive).

Recommendation: Introduce a PipelineContext trait or struct that abstracts data access. This would allow testing the orchestration logic with in-memory implementations.

4.2 [P2] Raw SQL inline in resolve_model()

File: /Users/oabrivard/Projects/rust/ai_synth/backend/src/services/synthesis.rs, lines 1419-1429

let model = sqlx::query_scalar::<_, String>(
    r#"SELECT m->>'model_id' FROM admin_providers, ..."#,
)

This is the only place in the service layer that contains raw SQL. All other queries go through the db/ module, maintaining a clean separation. This breaks the pattern.

Recommendation: Move to db::providers::get_default_scraping_model(pool, provider_name).

4.3 [P3] LLM provider implementations share identical HTTP error handling

Each of the three providers (gemini.rs, openai.rs, anthropic.rs) implements the same pattern:

  1. Build request body (provider-specific)
  2. Send HTTP request
  3. Map network errors with is_timeout() / is_connect() classification
  4. Parse response JSON
  5. Check HTTP status and map errors
  6. Extract content from provider-specific response structure

Steps 2-4 are identical across all three providers (~20 lines each). Only steps 1, 5, and 6 differ.

Recommendation: Extract a send_llm_request() helper in llm/mod.rs:

async fn send_llm_request(
    client: &reqwest::Client,
    url: &str,
    body: &Value,
    headers: &[(String, String)],
    provider_name: &str,
) -> Result<(u16, Value), AppError>

4.4 [P3] Providers.tsx — 854 lines, complex inline state management

File: /Users/oabrivard/Projects/rust/ai_synth/frontend/src/pages/admin/Providers.tsx

The admin Providers page manages local editable copies of provider state in a Record<string, ProviderFormState> map, with functions for model array manipulation (add, remove, toggle default), scraping vs. websearch model lists, and inline validation. This is the most complex admin page and would benefit from splitting the model list editor into a reusable ModelListEditor component.

4.5 [P4] Settings.tsx — 694 lines, growing form complexity

File: /Users/oabrivard/Projects/rust/ai_synth/frontend/src/pages/Settings.tsx

The settings page has already been partially decomposed (SettingsBraveSearch, SettingsRateLimit, ApiKeyManager), which is good. The remaining monolithic JSX sections (provider selection, model dropdowns, import/export) could follow the same pattern for consistency.


5. Simplification Opportunities

5.1 [P3] Sources.tsx may be dead code

With the introduction of ThemeManager.tsx, which subsumes all source management under themes, the standalone Sources.tsx page may no longer be reachable by users. It is still registered in the router, but if all sources must now belong to a theme, the standalone page serves no purpose.

Action: Verify whether Sources.tsx is still linked in the navigation. If not, remove it and its route to eliminate 481 lines of duplicated code.

5.2 [P3] list_for_user query branch duplication

File: /Users/oabrivard/Projects/rust/ai_synth/backend/src/db/sources.rs, lines 15-44

The function has two nearly identical SQL queries — one with AND theme_id = $2 and one without. The only difference is the optional WHERE clause.

Recommendation: Use a single query with a conditional clause:

sqlx::query_as::<_, Source>(
    "SELECT ... FROM sources WHERE user_id = $1 AND ($2::uuid IS NULL OR theme_id = $2) ORDER BY ..."
)
.bind(user_id)
.bind(theme_id)

5.3 [P4] bulk_create uses sequential inserts instead of batch

File: /Users/oabrivard/Projects/rust/ai_synth/backend/src/db/sources.rs, lines 97-127

Sources are inserted one by one in a loop. For bulk imports of 50-100 sources, this generates 50-100 round-trips to the database.

Recommendation: Use sqlx's batch insert or build a single INSERT ... VALUES ($1, $2), ($3, $4), ... query. This is a performance optimization, not a correctness issue.

5.4 [P4] Hardcoded snippet sizes

In synthesis.rs, the snippet size is computed from summary_length in two separate places (Phase 1 at line 437 and Brave at lines 766-770):

let snippet_size = match theme.summary_length { 1 => 500, 2 => 2000, _ => 4000 };

Recommendation: Extract to a function fn snippet_size_for_length(summary_length: i32) -> usize.


6. File Size Summary

Backend (top 10 by line count)

File Lines Assessment
services/synthesis.rs 2010 Needs decomposition (P1)
services/scraper.rs 1280 Acceptable, extract tests
services/rate_limiter.rs 471 Clean
services/llm/anthropic.rs 471 Minor shared-code opportunity
services/export.rs 459 Clean
handlers/admin.rs 438 Audit boilerplate
models/synthesis.rs 416 Clean
services/email.rs 384 Clean
handlers/auth.rs 381 Clean
services/llm/openai.rs 373 Minor shared-code opportunity

Frontend (top 10 by line count)

File Lines Assessment
pages/ThemeManager.tsx 935 Needs decomposition (P1/P3)
pages/admin/Providers.tsx 854 Extract ModelListEditor (P3)
pages/Settings.tsx 694 Partially decomposed, continue (P4)
pages/SynthesisDetail.tsx 548 Acceptable
pages/Sources.tsx 481 Possibly dead code (P3)
pages/GenerateSynthesis.tsx 471 Clean
i18n/fr.ts 462 Expected size for translations
pages/ArticleHistory.tsx 371 Clean
pages/Home.tsx 345 Clean
components/settings/SettingsSchedule.tsx 286 Clean

7. Positive Observations

These aspects of the codebase are well-executed and should be preserved:

  1. Error handling: AppError enum with IntoResponse is clean, consistent, and hides internal details. Tests verify that secrets are never leaked.
  2. Security: SSRF prevention with DNS resolution checks, AES-256-GCM encryption for API keys, CSRF via X-Requested-With, timing-attack mitigation in auth, and sensitive data scrubbing in error messages.
  3. LLM provider abstraction: The LlmProvider trait + factory pattern makes adding new providers straightforward.
  4. Documentation: Module-level //! doc comments on every file, function-level /// doc comments with examples, and clear CLAUDE.md project instructions.
  5. Frontend component extraction: SettingsBraveSearch, SettingsRateLimit, SettingsSchedule, and ApiKeyManager demonstrate good instincts for decomposition.
  6. Type safety: Frontend types.ts is clean, well-organized, and provides isApiError type guard.
  7. Test coverage: Unit tests for error handling, SSRF checks, URL normalization, job store, role validation, and CSV parsing.

8. Prioritized Action Plan

Priority Item Effort Impact
P1 Decompose synthesis.rs into pipeline module (1.1) Large Reduces complexity, enables testing
P1 Extract shared SourceManager component (2.1) Medium Eliminates ~300 lines of duplication
P1 Extract shared scrape+classify function (2.2) Medium Eliminates ~120 lines of duplication
P2 Move hardcoded French strings to constants (3.1) Medium Enables future i18n, improves consistency
P2 Frontend error-handling helper (2.3) Small Reduces boilerplate in 14 files
P2 Abstract data access from pipeline (4.1) Large Enables unit testing without Postgres
P2 Move inline SQL from resolve_model to db module (4.2) Small Maintains architecture consistency
P2 Extract scraper tests to separate file (1.2) Small Improves file navigation
P3 Decompose ThemeManager.tsx into sub-components (1.3) Medium Improves readability
P3 Introduce parameter structs for long signatures (3.2) Small Removes clippy suppressions
P3 Define category key constants (3.3) Small Prevents typo bugs
P3 Audit whether Sources.tsx is dead code (5.1) Small Potential -481 lines
P3 Consolidate LLM HTTP request handling (4.3) Medium Reduces duplication across 3 files
P4 Batch insert for bulk_create (5.3) Small Performance improvement