You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

238 lines
22 KiB
Markdown

# Architecture & SOLID Audit Report
## Executive Summary
AI Weekly Synth demonstrates strong foundational architecture with clean layer separation (handlers, services, DB), a well-designed trait abstraction for LLM providers, and robust error handling. However, the synthesis pipeline (`services/synthesis.rs`) has accumulated significant complexity that violates Single Responsibility and makes the code difficult to test, extend, and maintain. The codebase would benefit most from decomposing the pipeline into smaller, composable stages and introducing trait-based dependency injection for external services.
## SOLID Analysis
### Single Responsibility
**Mostly followed in the outer layers; severely violated in the synthesis pipeline.**
**Good examples:**
- `errors.rs` has a single job: map application errors to HTTP responses. Clean, focused, well-tested.
- `middleware/auth.rs` does exactly one thing: extract authenticated users from session cookies. The `AdminUser` wrapper cleanly composes on top.
- `services/encryption.rs` is a focused module for AES-256-GCM operations with no extraneous concerns.
- `services/auth.rs` cleanly coordinates token creation, email sending, and session management without overstepping.
- Each file in `db/` handles queries for exactly one table (users, sessions, syntheses, etc.).
**Critical violation: `services/synthesis.rs` (~1250 lines)**
This single file is responsible for:
1. Job lifecycle management (create, subscribe, cleanup via `JobStore`)
2. Progress event definition and emission
3. Provider and API key resolution
4. Rate limiting coordination
5. Source rotation strategy
6. Parallel scraping orchestration (Phase 1)
7. LLM classification and summarization orchestration (Phase 1)
8. Brave Search integration orchestration (Phase 2)
9. LLM web search fallback orchestration (Phase 2)
10. URL normalization, hashing, and deduplication
11. Article history filtering
12. Source diversity enforcement
13. Section assembly and persistence
14. Article provenance tracing
15. Error sanitization
The `run_generation_inner` function alone spans ~640 lines. The nearly identical scrape-then-classify batch processing loop is duplicated between Phase 1 (lines 406-551) and the Brave Search path (lines 627-758), with only minor differences in tuple shapes. This duplication is a strong signal that a shared abstraction is missing.
**Other SRP observations:**
- `handlers/auth.rs` mixes HTTP concern (cookie construction in `build_session_cookie`) with business logic (token verification in `verify_token`). The cookie construction could live in a utility or the auth service.
- `AppState` (in `app_state.rs`) holds both infrastructure (pool, config) and domain state (job_store, rate_limiters). This is acceptable at current scale but should be watched as the application grows.
### Open/Closed
**Good for LLM providers; poor for the synthesis pipeline.**
**Good examples:**
- The `LlmProvider` trait plus `factory::create_provider` follow OCP well. Adding a new provider (e.g., Mistral) requires implementing the trait and adding one match arm in the factory -- no existing code needs modification.
- `AppError` uses enum variants with `thiserror` derives. Adding a new error category is additive.
- The `FromRequestParts` extractors (`AuthUser`, `AdminUser`) compose cleanly -- adding new authorization levels means adding new extractors.
**Violations:**
- The synthesis pipeline has hardcoded knowledge of all search strategies (LLM search, Brave Search, personalized sources). Adding a new search backend (e.g., Serper, SearXNG) requires modifying the 640-line orchestration function. There is no `SearchBackend` trait or strategy pattern.
- `factory::create_provider` uses a match statement that must be modified for each new provider. A registration-based factory (e.g., `HashMap<&str, fn(String, Client) -> Arc<dyn LlmProvider>>`) would eliminate this.
- The `ProviderRateLimiter` embeds SQL queries directly (line 173 of `rate_limiter.rs`) rather than going through the `db` module, bypassing the architectural boundary. The comment says "to avoid circular dependency" -- this signals a structural issue.
### Liskov Substitution
**Correctly applied where traits are used.**
- All three `LlmProvider` implementations (`GeminiProvider`, `OpenAiProvider`, `AnthropicProvider`) correctly implement the trait contract. Callers use `Arc<dyn LlmProvider>` throughout and never downcast.
- The `AnthropicProvider` implementation is noteworthy: it adapts to Anthropic's lack of native JSON schema enforcement by embedding the schema in the system prompt. This is a legitimate behavioral difference that is hidden behind the trait -- exactly what LSP intends. The contract is "return structured JSON matching the schema," not "use native schema enforcement."
- `FromRequestParts` implementations for `AuthUser` and `AdminUser` correctly maintain the contract expectations.
**No LSP violations detected.** The trait abstractions are lean and appropriately designed.
### Interface Segregation
**Traits are lean and focused -- no violations.**
- `LlmProvider` has only two methods: `provider_id()` and `call_llm()`. This is appropriately minimal. If web search support needed to be optional (e.g., only some providers support grounded search), a separate `WebSearchProvider` trait would be warranted, but currently the pipeline handles this externally.
- The codebase uses free functions in modules (e.g., `db::users::find_by_email`) rather than trait objects for the database layer. This is idiomatic Rust and avoids over-engineering trait hierarchies.
**Observation:** The absence of traits for services like scraping, email, and encryption is acceptable given the project's scale, but limits testability (see Dependency Inversion section).
### Dependency Inversion
**Partially applied -- the weakest SOLID principle in this codebase.**
**Good:**
- Handlers depend on `AppState` (an abstraction container), not directly on database implementations.
- The synthesis pipeline depends on `Arc<dyn LlmProvider>`, not on concrete provider types.
**Significant gaps:**
- **No abstraction over the database layer.** All services and the synthesis pipeline call `db::*` functions directly (e.g., `db::settings::get_or_create_default`, `db::article_history::check_urls_exist`). This means the synthesis pipeline cannot be unit-tested without a live Postgres instance.
- **No abstraction over the scraper.** The synthesis pipeline calls `scraper::scrape_url` and `source_scraper::extract_article_links` as concrete functions. This prevents testing pipeline logic with deterministic, fake scraped content.
- **No abstraction over the email service.** `services/email.rs` is called directly. Testing email-sending behavior requires either a live Resend API key or the hardcoded test bypass.
- **`AppState` is a concrete struct, not a trait.** All handlers are bound to `State<AppState>`. While this is standard Axum practice, it means handler logic cannot be tested without constructing a full `AppState` with a real database pool.
The integration tests in `tests/common/mod.rs` confirm this: `TestApp` must create a real Postgres database for every test, run migrations, and tear down afterward. The test helper works well, but the inability to unit-test service logic in isolation is a direct consequence of missing abstractions.
## Design Patterns
### Well-Used Patterns
1. **Strategy Pattern (LLM Providers):** The `LlmProvider` trait + `factory::create_provider` is a textbook Strategy implementation. Providers encapsulate provider-specific API differences (request format, response parsing, error mapping) behind a uniform interface. The factory cleanly maps configuration to concrete strategy.
2. **Observer Pattern (Progress Streaming):** The `watch` channel pattern for SSE progress streaming is elegant. The `JobStore` manages pub/sub lifecycle, and the `WatchStream` adapter in the handler converts it to an SSE stream. This design correctly handles reconnection (subscribers immediately get the latest state).
3. **Extractor Pattern (Authentication):** Axum's `FromRequestParts` is used idiomatically for `AuthUser` and `AdminUser`. This is effectively a Decorator pattern -- handlers declare their auth requirements via type parameters, and the framework handles the cross-cutting concern transparently.
4. **Builder Pattern (Configuration):** `AppConfig::from_env()` + `validate()` is a clean two-phase construction pattern. Validation is separated from loading, allowing the application to fail fast with clear error messages.
5. **Value Object Pattern:** `MasterKey` is a well-designed value object. It enforces invariants on construction (`from_hex`), prevents accidental cloning (no `Clone` derive), and zeroizes memory on drop. This is security-conscious Rust.
6. **Repository Pattern (DB layer):** Each `db::*` module acts as a repository, encapsulating SQL queries behind typed function signatures. The separation between the `UserRow` (DB representation) and `User` (domain model) in `db/users.rs` is a good Data Mapper implementation.
### Missing Patterns
1. **Pipeline/Chain of Responsibility:** The synthesis pipeline is a sequential series of stages (load settings -> resolve provider -> scrape sources -> classify articles -> web search fallback -> assemble -> save). This is a textbook pipeline pattern that would benefit from explicit stage modeling:
```
trait PipelineStage {
async fn execute(&self, ctx: &mut PipelineContext) -> Result<(), AppError>;
}
```
Each stage would be independently testable and composable.
2. **Unit of Work / Transaction:** The synthesis pipeline performs many database writes (article history inserts, LLM call logs, final synthesis save) without a unifying transaction. If the synthesis save fails after article history has been written, the data is inconsistent. A transactional boundary around the save phase would prevent orphaned records.
3. **Circuit Breaker:** The rate limiter waits up to 60 seconds for rate limit windows to pass, but there is no circuit breaker for provider failures. If a provider returns 5xx errors repeatedly, the pipeline will keep retrying individual articles until the batch is exhausted. A circuit breaker that short-circuits after N consecutive failures would save time and API quota.
4. **Result Collector / Accumulator:** The batch processing loops manually manage `scraped_articles`, `filled_counts`, `source_counts`, and `seen_urls` as mutable state threaded through the loop. An accumulator pattern would encapsulate this state and its invariants (e.g., "never exceed max_items_per_category") into a dedicated struct with methods like `try_add_article()`.
### Anti-Patterns
1. **God Function:** `run_generation_inner` (~640 lines) is the most impactful anti-pattern. It manages too many concerns, has deeply nested control flow, and is untestable in isolation. The function's length makes it difficult to reason about edge cases and encourages bugs during modification.
2. **Duplicated Logic (DRY Violation):** The scrape-then-classify batch loop appears in near-identical form in three places:
- Phase 1 personalized sources (lines ~406-551)
- Phase 2 Brave Search path (lines ~627-758)
- The response parsing differs slightly, but the core pattern (spawn scrape tasks -> collect results -> spawn classify tasks -> collect and categorize) is the same.
A shared `process_article_batch()` function accepting a list of `(url, source_identifier)` would eliminate ~150 lines of duplication.
3. **Primitive Obsession:** Categories are tracked throughout the pipeline using raw strings and index-based keys (`"category_0"`, `"category_autre"`). A `CategoryIndex` newtype or enum would prevent bugs like off-by-one errors and make the intent clearer.
4. **Implicit Protocol (Stringly Typed):** The `status` field in article history uses raw strings (`"filtered_history"`, `"filtered_diversity"`, `"used"`, `"filtered_homepage"`, etc.). An enum would provide compile-time guarantees and prevent typos.
5. **Configuration Object Overload:** `UserSettings` has 16 fields that are all passed around as a single struct. The synthesis pipeline destructures individual fields from it in many places (e.g., `settings.max_items_per_category`, `settings.batch_size`, `settings.use_brave_search`). This makes it hard to understand which settings each stage actually depends on.
## Architecture Assessment
### Layer Separation
**Clean three-layer architecture with appropriate boundaries.**
```
handlers/ (HTTP) -> services/ (Business Logic) -> db/ (Data Access)
```
- **Handlers** correctly limit themselves to request parsing, validation, calling services/DB, and response formatting. No handler contains business logic.
- **Services** implement the business rules. The auth service coordinates between DB, email, and token utilities. The encryption service is self-contained.
- **DB modules** are pure data access with no business logic. SQL queries are parameterized and safe.
**Minor boundary violations:**
- `ProviderRateLimiter::reload_from_db` in `services/rate_limiter.rs` contains raw SQL queries instead of calling through `db::rate_limits`. The comment acknowledges this is to "avoid circular dependency," suggesting the module boundaries need adjustment.
- `resolve_model` in `services/synthesis.rs` also contains raw SQL (`sqlx::query_scalar` with inline SQL) instead of going through `db::providers`. This bypasses the DB layer abstraction.
### Coupling
**Low coupling between modules; high coupling within the synthesis pipeline.**
- Handler modules are independent of each other. Each handler file imports only from `app_state`, `errors`, `middleware`, relevant `models`, and relevant `services`/`db` modules.
- The LLM provider implementations are fully decoupled -- they share no state and have no knowledge of each other.
- The `db` modules have zero inter-module dependencies (no `db::users` calling `db::sessions`, etc.).
**High coupling in the synthesis pipeline:**
- `services/synthesis.rs` imports from 9 different modules: `app_state`, `db` (5 submodules), `errors`, `models` (2 submodules), `services` (5 submodules). It is the most connected module in the codebase and acts as a "gravity well" that pulls in all dependencies.
- The `AppState` struct couples all concerns: config, pool, HTTP client, auth rate limiter, provider rate limiter, user rate limiters, and job store. Any change to any of these affects the shared state type.
### Error Handling
**Excellent error handling strategy -- one of the strongest aspects of the codebase.**
- **Single error type:** `AppError` is the only error type in the application. All handlers return `Result<impl IntoResponse, AppError>`. This is clean and consistent.
- **Automatic conversions:** `From<sqlx::Error>` and `From<anyhow::Error>` provide ergonomic `?` operator usage while ensuring errors are properly categorized.
- **Security-conscious:** Internal errors log the full details server-side but return only "An internal error occurred" to the client. The `sanitize_error_message` function in the synthesis pipeline ensures API keys are never exposed in SSE error events.
- **No `unwrap()` in production paths:** Consistent with the project's stated policy. The few `unwrap()` calls are in test code or behind `expect()` with clear messages (e.g., `"valid minutes value"`).
- **Error variants are semantically meaningful:** `NotFound`, `Unauthorized`, `Forbidden`, `BadRequest`, `Validation`, `Internal`, `RateLimited` -- each maps to a specific HTTP status code with appropriate behavior.
**Minor observation:** The `Validation` variant accepts a `String`, but a structured error type (e.g., with field name + message) would be more useful for frontends that want to highlight specific form fields.
### State Management
**In-memory state is well-designed with appropriate concurrency primitives.**
- `DashMap` is used for lock-free concurrent access in `RateLimiter`, `ProviderRateLimiter`, `JobStore`, and `user_rate_limiters`. This avoids `Mutex` contention under load.
- `Arc<watch::Sender>` for progress events is the right choice -- it provides "latest value" semantics so reconnecting SSE clients immediately get caught up.
- The `JobStore` has TTL-based cleanup (1 hour) and post-completion cleanup (5 minutes), preventing memory leaks from abandoned jobs.
**Potential concern:** The `RateLimiter` in `rate_limiter.rs` stores timestamps in a `Vec<Instant>` (for the per-key limiter) rather than a `VecDeque`. Old timestamps are evicted via `retain()` which is O(n), whereas `VecDeque::pop_front()` is O(1). The `ProviderRateLimiter` correctly uses `VecDeque`. This inconsistency is minor but worth unifying.
**State growth risk:** The `user_rate_limiters: DashMap<Uuid, UserRateLimitEntry>` in `AppState` grows unboundedly as users trigger generation jobs. Entries are only removed when a user's settings change to have no rate limits. A periodic cleanup (e.g., remove entries for users who haven't generated in 24 hours) would prevent slow memory growth.
## Prioritized Recommendations
1. **Decompose `services/synthesis.rs` into a multi-stage pipeline.** Extract the monolithic `run_generation_inner` into discrete, testable stages: `SettingsStage`, `SourceScrapingStage`, `ClassificationStage`, `WebSearchStage`, `AssemblyStage`. Each stage should operate on a shared `PipelineContext` struct. This addresses the God Function anti-pattern, the DRY violation in batch processing, and enables unit testing of individual stages.
- Files: `backend/src/services/synthesis.rs` (1250+ lines)
- Impact: Testability, maintainability, extensibility
2. **Extract a shared `process_article_batch()` function** as an immediate, lower-risk improvement. The near-identical scrape-then-classify loop in Phase 1 and the Brave Search path can be unified into a single function that accepts `Vec<(String, Option<String>)>` (URL + optional source URL) and returns classified `NewsItem`s. This can be done without the full pipeline refactor.
- Files: `backend/src/services/synthesis.rs` (lines ~406-551 and ~627-758)
- Impact: ~150 lines of duplication removed, fewer bugs from divergent copies
3. **Introduce trait-based abstractions for external services** to enable unit testing. Define traits for `ArticleScraper`, `SearchService`, and `ArticleStore` (history). The synthesis pipeline would depend on these traits, allowing tests to inject fakes. Start with the DB layer trait since it blocks the most test coverage.
- Files: `backend/src/services/synthesis.rs`, `backend/src/services/scraper.rs`, `backend/src/db/article_history.rs`
- Impact: Enables unit testing of synthesis pipeline without Postgres
4. **Replace stringly-typed status values with enums.** The article history `status` field uses raw strings (`"filtered_history"`, `"filtered_diversity"`, `"used"`, etc.). Define an `ArticleStatus` enum with `Display`/`FromStr` implementations. Similarly, replace the `"category_0"` / `"category_autre"` string keys with a `CategoryKey` newtype.
- Files: `backend/src/services/synthesis.rs`, `backend/src/db/article_history.rs`
- Impact: Type safety, compile-time correctness guarantees
5. **Move raw SQL out of service modules.** `ProviderRateLimiter::reload_from_db` and `resolve_model` in `synthesis.rs` contain inline SQL queries that bypass the `db/` layer. Move these queries to `db/rate_limits.rs` and `db/providers.rs` respectively. If circular dependencies are the issue, restructure the module hierarchy (e.g., make the rate limiter accept configuration data rather than a pool).
- Files: `backend/src/services/rate_limiter.rs` (lines 171-199), `backend/src/services/synthesis.rs` (lines 1191-1220)
- Impact: Architectural consistency, single source of truth for SQL
6. **Introduce an article accumulator struct** to encapsulate the mutable tracking state in the synthesis pipeline. Currently, `article_scraped`, `source_counts`, `url_source`, `filled_counts`, and `seen_urls` are managed as independent `HashMap`/`HashSet` variables threaded through loops. A `SynthesisAccumulator` struct with methods like `try_add_article(url, category, source) -> bool` would encapsulate invariants (max per category, max per source, deduplication) and simplify the pipeline code.
- Files: `backend/src/services/synthesis.rs`
- Impact: Reduced cognitive complexity, fewer state-management bugs
7. **Add a circuit breaker for LLM provider failures.** If a provider returns errors for N consecutive calls, stop sending requests and fail fast. The current behavior retries each article independently, wasting time and API quota when the provider is down.
- Files: `backend/src/services/synthesis.rs`
- Impact: Resilience, user experience during outages
8. **Unify `Vec<Instant>` to `VecDeque<Instant>` in `RateLimiter`.** The per-key `RateLimiter` uses `Vec::retain()` (O(n)) while `ProviderRateLimiter` uses `VecDeque::pop_front()` (O(1)). Switch the per-key limiter to `VecDeque` for consistency and performance under high request volumes.
- Files: `backend/src/services/rate_limiter.rs` (lines 37-77)
- Impact: Minor performance improvement, code consistency
9. **Add periodic cleanup for `user_rate_limiters` in `AppState`.** The `DashMap<Uuid, UserRateLimitEntry>` grows unboundedly. Add a background task (similar to `JobStore` TTL cleanup) that removes entries for users inactive for more than 24 hours.
- Files: `backend/src/app_state.rs`, `backend/src/main.rs`
- Impact: Prevents slow memory growth in long-running deployments
10. **Consider making `UpdateSettingsRequest::validate()` return `Result<(), AppError>` instead of `Result<(), String>`.** Currently, handlers must map the string error: `body.validate().map_err(AppError::Validation)`. Having validate return the correct error type directly would be more ergonomic and consistent with the rest of the codebase.
- Files: `backend/src/models/settings.rs`
- Impact: Minor ergonomic improvement, consistency