You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

24 KiB

Raw Blame History

AI Weekly Synth -- Architecture Audit Report (v2)

Date: 2026-03-27 Scope: Full backend codebase (Rust/Axum), key frontend architecture observations Auditor: Software Architect (automated)

Executive Summary

AI Weekly Synth is a well-structured Rust/Axum application that has grown substantially from its initial design. The codebase demonstrates strong fundamentals: consistent error handling, good security practices, clean layer separation between handlers/services/db, and idiomatic use of Axum extractors. Unit test coverage is solid across models, services, and middleware.

However, the growth -- particularly the addition of multi-theme synthesis, scheduled generation, Brave Search, windowed source extraction, and article history tracing -- has introduced several architectural tensions. The synthesis pipeline (synthesis.rs) has become a 1500+ line monolith carrying at least five distinct responsibilities. The scheduler bypasses the job store abstraction. And several cross-cutting concerns (provider resolution, rate limiting, history tracking) are tightly coupled to concrete implementations, making the system harder to test, extend, and reason about.

This report organizes findings by SOLID principles, design patterns, architecture, and dependency management, then closes with prioritized recommendations.

1. SOLID Principles

1.1 Single Responsibility Principle (SRP)

Critical: synthesis.rs is a God Module

At 1500+ lines, services/synthesis.rs carries at least six distinct responsibilities:

Responsibility	Lines (approx.)	Should be
Job store (in-memory concurrent map)	1-193	Own module `services/job_store.rs`
Progress event types + emission	37-71, 1063-1071	Own module or part of job store
Pipeline orchestration (phases 1 + 2 + save)	200-1038	`services/pipeline.rs` or `services/generation/mod.rs`
Article scraping + classification logic	471-616, 700-880	`services/article_processor.rs`
URL filtering, normalization, hashing	1255-1339	`services/url_utils.rs`
Provider/model/key resolution	1342-1446	`services/provider_resolver.rs`

The run_generation_inner function alone is ~840 lines. It manages five HashMap/HashSet tracking structures, two nested loop levels (waves and batches), two separate pipeline phases (personalized sources and web search fallback), and three code paths in Phase 2 (Brave Search, LLM search, skip). This makes the function extremely difficult to test in isolation, review for correctness, or extend with new pipeline stages.

Moderate: scheduler.rs duplicates pipeline invocation logic

The scheduler constructs its own watch::channel and AtomicBool, calls run_generation_inner directly, and handles email sending inline. It bypasses the JobStore entirely, which means:

Scheduled jobs are invisible to the SSE progress API
The one-job-per-user guard does not apply (it only checks job_store.has_active_job)
Email sending logic (fetch synthesis, iterate recipients, call email::send_synthesis_email) is duplicated -- the handler version is in syntheses.rs handler

Moderate: AppState accumulates responsibilities

AppState holds configuration, database pool, HTTP client, auth rate limiter, provider rate limiter, per-user rate limiters, and the job store. While Clone-cheap (all Arc-based), it acts as a service locator, making it unclear which components a given handler actually depends on. With 8 fields, this is approaching the point where injecting specific dependencies would improve clarity.

Minor: Handler-level response types

Some response types are defined in handlers (AdminUserResponse in handlers/admin.rs, GenerateResponse in handlers/generation.rs, ListResponse in handlers/syntheses.rs) while others are in models. This inconsistency is minor but creates ambiguity about where to look for types.

1.2 Open/Closed Principle (OCP)

Well-applied: LLM Provider abstraction

The LlmProvider trait + factory pattern is the cleanest abstraction in the codebase. Adding a new provider (e.g., Mistral) requires:

A new module implementing LlmProvider
A new match arm in factory.rs
No changes to the pipeline

This is textbook OCP.

Violation: Pipeline Phase 2 branching

Phase 2 of the pipeline has a hard-coded if settings.use_brave_search { ... } else { ... } branch that selects between two entirely different code paths (Brave Search vs. LLM web search). Each path contains ~150 lines of nearly identical scrape+classify logic. Adding a third search strategy (e.g., Bing, Perplexity, SearXNG) would require another else if branch with the same duplicated scrape/classify logic.

Violation: Provider resolution fallback defaults

resolve_model contains hard-coded fallback model names ("gemini-2.5-pro", "gpt-4o", "claude-sonnet-4-20250514"). These will silently become stale as providers release new models. The fallback chain should be configurable or fail loudly.

1.3 Liskov Substitution Principle (LSP)

Generally well-respected. The LlmProvider trait implementations are fully substitutable. The MockLlmProvider correctly implements the same interface and is used in tests via the provider_override parameter.

Minor concern: The mock provider identifies call types by inspecting the system prompt content (sys_lower.contains("classer"), sys_lower.contains("precis")). This couples the mock to the French-language prompt wording, making it fragile if prompts change.

1.4 Interface Segregation Principle (ISP)

Well-applied: Axum extractors

AuthUser and AdminUser are clean, focused extractors. Handlers declare exactly the auth level they need. The AdminUser wrapper pattern (newtype over AuthUser) is idiomatic and minimal.

Opportunity: LlmProvider trait could be narrower

The current trait has two methods: provider_id() and call_llm(). If the codebase later needs streaming, embedding, or tool-calling capabilities, the trait should be split rather than extended, per ISP.

1.5 Dependency Inversion Principle (DIP)

Critical: Pipeline depends on concrete implementations

run_generation_inner directly calls:

db::settings::get_or_create_default (concrete DB queries)
db::themes::get_by_id (concrete DB queries)
db::sources::list_for_user (concrete DB queries)
db::article_history::check_urls_exist (concrete DB queries)
db::article_history::batch_insert_entries (concrete DB queries)
crate::services::scraper::scrape_url (concrete HTTP scraping)
source_scraper::extract_article_links (concrete link extraction)
crate::services::brave_search::search (concrete Brave API)
encryption::decrypt / encryption::MasterKey::from_hex (concrete crypto)

None of these are injected as traits. The provider_override parameter for LlmProvider is the only dependency that can be swapped -- and it was added specifically for testing. This makes the pipeline untestable without a live Postgres database and network access.

Moderate: ProviderRateLimiter embeds its own SQL

The ProviderRateLimiter::reload_from_db method contains raw sqlx::query_as calls rather than going through the db::rate_limits module. The comment says "to avoid circular dependency," but this violates the layer boundary and duplicates the DB schema knowledge.

2. Design Patterns

2.1 Well-Applied Patterns

Pattern	Where	Assessment
Factory Method	`llm/factory.rs`	Clean, tested, extensible
Strategy	`LlmProvider` trait	Proper polymorphism via `Arc<dyn LlmProvider>`
Observer	`watch::channel` for SSE	Elegant use of tokio primitives; late subscribers get latest state
Repository	`db/` modules	Clean separation of SQL from business logic
Extractor	`AuthUser`, `AdminUser`	Idiomatic Axum; composable auth
Builder	`AppState::new`, `build_scraper_client`	Consistent construction patterns
Newtype	`AdminUser(AuthUser)`, `MasterKey`	Type safety for authorization and crypto

2.2 Missing or Needed Patterns

Pipeline / Chain of Responsibility

The synthesis generation is conceptually a pipeline with discrete stages:

Load settings + theme + sources
Phase 1: Source extraction (windowed)
Phase 1: Scrape + classify (batched)
Phase 2: Web search fallback (Brave or LLM)
Phase 2: Scrape + classify fallback results
Assemble sections + save

Each stage could be a separate struct implementing a PipelineStage trait, with shared context passed through. This would make the pipeline testable per-stage, enable adding/removing stages declaratively, and reduce run_generation_inner from 840 lines to ~50.

Unit of Work / Transaction Manager

Article history tracing uses a manual pending_traces buffer that is flushed at "logical boundaries." This ad-hoc batching is scattered across 7 locations in the pipeline. A dedicated TraceBatcher struct could encapsulate the buffer, auto-flush thresholds, and error handling.

Event Sourcing (lightweight)

The ProgressEvent enum is close to an event-sourced model but is currently fire-and-forget via watch::channel (which only retains the latest value). If the system needs progress history for debugging or UI replay, the events should be collected in a log alongside the watch channel.

2.3 Anti-Patterns

Copy-Paste Programming (Critical)

The scrape+classify logic appears nearly identically in three places:

Phase 1 source processing (lines ~470-616)
Phase 2 Brave Search processing (lines ~700-880)
Phase 2 LLM search validation (lines ~936-960, simpler variant)

Each instance: spawns a JoinSet for scraping, collects results, checks rate limits, spawns a JoinSet for classification, parses LLM responses, checks is_article, extracts dates, assigns categories, updates tracking maps. The only differences are: which source_type string is recorded and whether source_url is Some.

A single scrape_and_classify_batch function parameterized by source type would eliminate ~300 lines of duplication.

Primitive Obsession

The pipeline uses six HashMap/HashSet variables (article_scraped, source_counts, url_source, filled_counts, seen_urls, and the pending_traces vec) as raw tracking state. These represent a coherent concept -- "pipeline context" or "generation state" -- and should be bundled into a struct:

struct GenerationContext {
    articles_by_category: HashMap<String, Vec<NewsItem>>,
    source_domain_counts: HashMap<String, usize>,
    url_to_source: HashMap<String, String>,
    category_fill_counts: HashMap<String, usize>,
    seen_urls: HashSet<String>,
    pending_traces: Vec<ArticleHistoryEntry>,
}

Magic Strings

Category keys like "category_0", "category_autre", "category_no_date", and status strings like "filtered_history", "filtered_diversity", "filtered_not_article", "filtered_too_old", "filtered_empty", "filtered_homepage", "filtered_cross_phase_dedup", "used" are scattered as string literals. These should be constants or an enum.

Stringly-Typed Configuration

Several settings use strings where enums would be safer:

settings.ai_provider (could be enum Provider { Gemini, OpenAi, Anthropic })
settings.search_agent_behavior (free-form, but could at least validate non-HTML)
synthesis.status (always "completed" in the codebase, but stored as String)

3. Architecture

3.1 Layer Separation

The codebase follows a three-layer architecture:

handlers/ (HTTP layer) --> services/ (business logic) --> db/ (data access)
                            |
                          models/ (shared types)

Assessment: Good but with leaks.

Handlers are thin and focused. They validate input, call services/db, and format responses. This is excellent.
The db/ layer is clean -- pure SQL queries returning typed results. No business logic leaks into SQL.
The services/ layer is where responsibilities blur. synthesis.rs calls db:: modules directly (bypassing any service abstraction), constructs its own SQL in resolve_model, and embeds scraping/classification logic that could be separate services.

Concern: The scheduler sits at the service layer but orchestrates at the handler level

The scheduler calls synthesis::run_generation_inner (a service) but also does email sending (another service), DB fetching (data layer), and schedule marking (data layer) all inline. It should either be a handler (if it needs to compose services) or delegate to a higher-level "generation + notification" service.

3.2 Error Handling

Strengths:

Unified AppError enum with IntoResponse -- all errors produce consistent JSON
Internal errors log full details but return generic messages to clients (security-conscious)
From<sqlx::Error> and From<anyhow::Error> conversions are clean
Error sanitization in sanitize_error_message prevents API key leakage

Weaknesses:

Errors are silently swallowed in multiple places via .ok():
- db::article_history::batch_insert_entries(...).await.ok() (7 occurrences) -- if tracing fails, there is no indication
- db::llm_call_log::truncate_old(...).await.ok() -- cleanup failure is invisible
- db::schedules::mark_run(...).await.ok() -- if this fails, the schedule may fire again next minute
unwrap_or_default() on serde_json::from_value calls silently drops malformed data (e.g., theme.categories deserialization). A warning log would be more appropriate.

3.3 State Management

In-memory state:

JobStore (DashMap-based) -- well-designed with TTL cleanup, user locking, and cancellation support
RateLimiter / ProviderRateLimiter -- properly Arc-wrapped for Clone-cheap sharing
user_rate_limiters: DashMap<Uuid, UserRateLimitEntry> -- handles setting changes atomically

Concern: No persistence for job state

If the server restarts during a generation, the in-memory job is lost with no way to recover. For a self-hosted single-instance app this may be acceptable, but if resilience is a goal, the job state should be backed by the database.

Concern: Scheduled job email state is fire-and-forget

The scheduler sends emails to up to 3 recipients per schedule. If one email fails, the others still send, but there is no retry or notification mechanism. mark_run is called unconditionally after a successful generation, even if all emails failed.

3.4 Concurrency Model

Strengths:

Proper use of tokio::task::JoinSet for parallel scraping and classification
DashMap + DashSet for lock-free concurrent access to shared state
AtomicBool for cooperative cancellation (avoids mutex overhead)
watch::channel for fan-out progress notifications

Weakness: Global rate limiter shared across scheduled + manual jobs

The ProviderRateLimiter is global. A scheduled job and a manual job for different users hitting the same provider share the same bucket. Under load, scheduled jobs could starve manual users (or vice versa). The architecture should consider per-user-or-per-job rate tracking for fairness.

3.5 Security Architecture

Strengths:

AES-256-GCM encryption for API keys at rest with per-key nonces
SSRF prevention in both scraper.rs and source_scraper.rs (IP allowlist checking, redirect validation)
CSRF protection via X-Requested-With header check on all mutating API endpoints
Session cookies are HttpOnly, SameSite=Lax, optionally Secure
Anti-enumeration in auth (same response for existent/non-existent emails, timing attack mitigation)
Error sanitization prevents API key leakage in SSE error events
CSP, X-Frame-Options, HSTS, Referrer-Policy headers

Concern: Gemini API key in URL

The GeminiProvider constructs the API URL as ...?key={api_key}. While the error handler carefully avoids logging the full URL, the key is still in the URL query string. This means:

It may appear in HTTP access logs on intermediary proxies
reqwest may include it in error messages despite the kind-only logging
If tracing is set to DEBUG level, the URL may be logged by tower-http's TraceLayer

This is a known Gemini API design limitation, but the risk should be documented.

4. Dependency Management and Testability

4.1 Test Architecture

Strengths:

Unit tests for all model validation logic (settings, theme, schedule, source, synthesis)
Unit tests for error handling, rate limiting, URL normalization, link extraction
Mock LLM provider enables end-to-end pipeline testing without real API calls
Factory tests verify correct provider instantiation

Weaknesses:

The core pipeline (run_generation_inner) cannot be unit-tested. It requires:
- A live PgPool (for all db:: calls)
- A real AppState (for config, rate limiters, job store)
- Network access (for scraping via http_client)
- Only the LLM provider can be mocked (via provider_override)
No integration tests for the scheduler
No tests for the Brave Search integration path
No tests for Phase 2 (web search fallback) at all

4.2 Dependency Injection

The codebase uses Axum's State(AppState) extractor as its sole DI mechanism. This works well for handlers but breaks down for services:

Services receive &AppState directly, gaining access to everything
There is no trait boundary between the pipeline and its dependencies (db, scraper, search)
The provider_override: Option<Arc<dyn LlmProvider>> parameter proves the value of DI -- it is the only seam that enables testing

Recommendation: Introduce a PipelineDeps trait (or struct of trait objects) that the pipeline receives, encapsulating:

Database access (settings, sources, themes, article history)
Scraping (source page scraping, article scraping)
Search (Brave, LLM web search)
Rate limiting
Key resolution

This would allow the entire pipeline to be tested with in-memory fakes.

4.3 Module Coupling

The dependency graph is mostly clean:

handlers --> services --> db
                |          |
              models <-----+
                |
              errors

Exceptions:

synthesis.rs calls db:: directly (bypasses service layer for settings, themes, sources, history, llm_call_log, syntheses)
synthesis.rs calls crate::services::prompts, crate::services::llm::schema, crate::services::scraper, crate::services::source_scraper, crate::services::brave_search, crate::services::encryption -- essentially importing the entire services layer
rate_limiter.rs contains its own SQL queries
handlers/syntheses.rs::list constructs a Synthesis model manually from the SynthesisWithThemeName join row, duplicating field mapping

5. Specific Code-Level Findings

5.1 The `#[allow(clippy::too_many_arguments)]` Smell

Two functions suppress this lint:

build_search_prompt (9 parameters)
log_llm_call (10 parameters)

Both are symptoms of missing context objects. build_search_prompt should take a SearchPromptConfig struct. log_llm_call should take a LlmCallRecord struct.

5.2 `run_generation_inner` Parameter List

The function takes 7 parameters: job_id, state, user_id, theme_id, tx, provider_override, cancelled. The first four are "what to generate," the last three are "infrastructure." A GenerationJob struct and a PipelineInfra struct would make intent clearer.

5.3 Inconsistent `serde_json::Value` vs. Typed Models

theme.categories is stored as serde_json::Value and deserialized inline with serde_json::from_value(...).unwrap_or_default(). This pattern appears at least 5 times across the codebase (themes, schedules, syntheses). Consider using sqlx::types::Json<Vec<String>> for typed extraction at the query level.

5.4 French/English Mixing

User-facing messages are consistently in French (good for i18n consistency), but code-level strings mix languages:

Status strings: "filtered_history" (English), "category_autre" (French), "Articles sans date" (French)
Log messages: English
Error messages to users: French

The recommendation is to keep all internal identifiers in English and use the i18n layer for user-facing strings.

6. Prioritized Recommendations

P0 -- High Impact, Do First

Extract GenerationContext struct from the 6 tracking variables in run_generation_inner. This is a safe refactor that immediately improves readability and reduces parameter passing.
Extract scrape_and_classify_batch function to eliminate the 300-line duplication between Phase 1, Brave Search, and LLM search paths. Parameterize by source_type: &str and source_url: Option<&str>.
Move JobStore to its own module (services/job_store.rs). It is already self-contained with no dependencies on synthesis logic. This reduces synthesis.rs by ~190 lines.

P1 -- Important, Do Soon

Introduce TraceBatcher struct to encapsulate the pending traces buffer, batch insert calls, and error handling. Replace the 7 manual flush sites with batcher.flush().
Make the scheduler use JobStore for scheduled jobs. This provides visibility into scheduled generation progress and prevents race conditions between scheduled and manual jobs. Add email sending as a post-completion hook rather than inline in the scheduler.
Replace magic strings with constants or enums for article statuses ("filtered_history", etc.), category keys ("category_0", "category_autre", "category_no_date"), and synthesis status.
Add structured logging to .ok() calls. Replace batch_insert_entries(...).await.ok() with if let Err(e) = batch_insert_entries(...).await { tracing::warn!(...) }.

P2 -- Improve Quality

Split Phase 2 search strategies into a SearchStrategy trait with BraveSearchStrategy and LlmSearchStrategy implementations. The pipeline would call strategy.search(query, max_results) without knowing which backend is used.
Extract provider_resolver.rs for the provider/key/model resolution logic (~100 lines currently in synthesis.rs).
Introduce PipelineDeps trait or struct to enable full pipeline testing without Postgres/network. Start with the article history check as the first dependency to extract, since it is the most frequently called.
Remove inline SQL from rate_limiter.rs and synthesis.rs::resolve_model. Route all queries through db/ modules.

P3 -- Nice to Have

Type the categories field as sqlx::types::Json<Vec<String>> instead of serde_json::Value to eliminate runtime deserialization.
Consolidate response types -- either all in models/ or all in handlers/, with a consistent convention.
Add a SearchPromptConfig struct to replace the 9-parameter build_search_prompt function.
Document the TOCTOU risk in the Gemini API key URL pattern and consider using the x-goog-api-key header instead (if supported by the Gemini API version in use).

7. What the Codebase Does Well

It is important to acknowledge the strengths that should be preserved during refactoring:

Error handling discipline: The AppError enum is consistently used everywhere. No panics in production code. Internal details are never leaked.
Security-first design: SSRF prevention, encrypted secrets, CSRF protection, anti-enumeration, session management -- all implemented correctly.
Idiomatic Axum usage: Extractors, state management, middleware composition, SSE streaming -- all follow framework conventions.
Test coverage on leaf components: Models, utils, and isolated services have thorough unit tests with boundary cases.
Documentation: Module-level doc comments, function-level doc comments, and inline comments explaining non-obvious decisions (e.g., the TOCTOU note in scraper.rs).
Operational features: Graceful shutdown, session cleanup, job TTL, rate limit hot-reload, cooperative cancellation -- these show production-mindedness.

Appendix: File Size Summary

File	Lines	Assessment
`services/synthesis.rs`	~1550	Critical -- needs decomposition
`services/scraper.rs`	~400	Acceptable
`services/rate_limiter.rs`	~470	Acceptable (includes thorough tests)
`services/prompts.rs`	~370	Acceptable (includes thorough tests)
`handlers/auth.rs`	~380	Acceptable
`handlers/sources.rs`	~280	Good
`handlers/admin.rs`	~440	Acceptable
`handlers/syntheses.rs`	~240	Good
`handlers/generation.rs`	~180	Good
`models/settings.rs`	~260	Good (includes thorough tests)
`models/synthesis.rs`	~415	Acceptable (includes thorough tests)
`errors.rs`	~173	Good
`app_state.rs`	~82	Good
`router.rs`	~178	Good
`scheduler.rs`	~93	Good size, but needs architectural changes

24 KiB Raw Blame History

AI Weekly Synth -- Architecture Audit Report (v2)

Executive Summary

1. SOLID Principles

1.1 Single Responsibility Principle (SRP)

1.2 Open/Closed Principle (OCP)

1.3 Liskov Substitution Principle (LSP)

1.4 Interface Segregation Principle (ISP)

1.5 Dependency Inversion Principle (DIP)

2. Design Patterns

2.1 Well-Applied Patterns

2.2 Missing or Needed Patterns

2.3 Anti-Patterns

3. Architecture

3.1 Layer Separation

3.2 Error Handling

3.3 State Management

3.4 Concurrency Model

3.5 Security Architecture

4. Dependency Management and Testability

4.1 Test Architecture

4.2 Dependency Injection

4.3 Module Coupling

5. Specific Code-Level Findings

5.1 The #[allow(clippy::too_many_arguments)] Smell

5.2 run_generation_inner Parameter List

5.3 Inconsistent serde_json::Value vs. Typed Models

5.4 French/English Mixing

6. Prioritized Recommendations

P0 -- High Impact, Do First

P1 -- Important, Do Soon

P2 -- Improve Quality

P3 -- Nice to Have

7. What the Codebase Does Well

Appendix: File Size Summary

24 KiB

Raw Blame History

5.1 The `#[allow(clippy::too_many_arguments)]` Smell

5.2 `run_generation_inner` Parameter List

5.3 Inconsistent `serde_json::Value` vs. Typed Models