docs: add implementation plan for polish and optimization items

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
master
oabrivard 3 months ago
parent 825c7212e1
commit 2e87ae854a

@ -0,0 +1,415 @@
# Polish & Optimization — Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** 8 polish/optimization items from the code audit: batch INSERTs, Arc clones, LazyLock selectors, createResource standardization, Button component usage, default settings alignment, SESSION_SECRET removal, Arc for master key.
**Architecture:** All tasks are independent — can be executed in any order. No behavioral changes. Pure optimization and cleanup.
**Tech Stack:** Rust (sqlx, std::sync::LazyLock), SolidJS, TypeScript
**Spec:** `docs/superpowers/specs/2026-03-26-polish-optimizations-design.md`
---
### Task 1: Remove unused `SESSION_SECRET`
**Files:**
- Modify: `backend/src/config.rs`
- Modify: `.env.example`
The simplest task — pure dead code removal.
- [ ] **Step 1: Remove `session_secret` from `AppConfig`**
In `backend/src/config.rs`:
- Remove `pub session_secret: String` from the `AppConfig` struct
- Remove `let session_secret = required_var("SESSION_SECRET")?;` from `from_env()`
- Remove `session_secret,` from the struct literal in `from_env()`
- Remove the `session_secret` length validation in `validate()`
- Remove `session_secret: "a".repeat(64),` from all test fixtures
- Remove the `test_validate_short_session_secret` test
- [ ] **Step 2: Remove from `.env.example`**
Remove the `SESSION_SECRET=` line from `.env.example`. Also remove any comment about it.
- [ ] **Step 3: Build and test**
Run: `cd backend && cargo build && cargo test --lib`
Expected: All pass
- [ ] **Step 4: Commit**
```bash
git add backend/src/config.rs .env.example
git commit -m "chore: remove unused SESSION_SECRET env var and config field"
```
---
### Task 2: Wrap `AppConfig` master key in `Arc`
**Files:**
- Modify: `backend/src/config.rs`
- [ ] **Step 1: Change field type to `Arc<String>`**
In `backend/src/config.rs`:
- Add `use std::sync::Arc;` to imports
- Change `pub master_encryption_key: String` to `pub master_encryption_key: Arc<String>`
- In `from_env()`, wrap: `master_encryption_key: Arc::new(master_encryption_key),`
- In test fixtures, wrap: `master_encryption_key: Arc::new("a".repeat(64)),`
All call sites use `&state.config.master_encryption_key` which works with `Arc<String>` via `Deref`.
- [ ] **Step 2: Build and test**
Run: `cd backend && cargo build && cargo test --lib`
Expected: All pass
- [ ] **Step 3: Commit**
```bash
git add backend/src/config.rs
git commit -m "chore: wrap master_encryption_key in Arc to reduce secret copies"
```
---
### Task 3: Cache CSS selectors with `LazyLock`
**Files:**
- Modify: `backend/src/services/source_scraper.rs`
- Modify: `backend/src/services/scraper.rs`
- [ ] **Step 1: Fix `source_scraper.rs` selectors**
In `backend/src/services/source_scraper.rs`, add at the top (after imports):
```rust
use std::sync::LazyLock;
static ANCHOR_SELECTOR: LazyLock<Selector> = LazyLock::new(|| Selector::parse("a[href]").unwrap());
```
Then replace both `Selector::parse("a[href]").unwrap()` calls (lines 80 and 137) with `&ANCHOR_SELECTOR`.
- [ ] **Step 2: Fix `scraper.rs` selectors**
In `backend/src/services/scraper.rs`, there are 14 `Selector::parse` calls. Many use `if let Ok(sel) = Selector::parse(...)` which handles parse failure gracefully. For the static selectors with `.unwrap()` (like `"title"`, `"h1"`, `"body"`), convert to `LazyLock`. For selectors inside `if let Ok(...)`, leave as-is since the pattern already handles failure.
Add at the top:
```rust
use std::sync::LazyLock;
static SEL_TITLE: LazyLock<Selector> = LazyLock::new(|| Selector::parse("title").unwrap());
static SEL_H1: LazyLock<Selector> = LazyLock::new(|| Selector::parse("h1").unwrap());
static SEL_BODY: LazyLock<Selector> = LazyLock::new(|| Selector::parse("body").unwrap());
```
Read the file carefully. Only convert `Selector::parse` calls that use `.unwrap()` to `LazyLock`. For calls using `if let Ok(sel) = Selector::parse(...)`, leave them as-is — these are error-handled and the selectors may vary.
- [ ] **Step 3: Build and test**
Run: `cd backend && cargo build && cargo test --lib`
Expected: All pass
- [ ] **Step 4: Commit**
```bash
git add backend/src/services/source_scraper.rs backend/src/services/scraper.rs
git commit -m "perf: cache CSS selectors with LazyLock to avoid re-parsing"
```
---
### Task 4: Reduce `.clone()` in pipeline with `Arc`
**Files:**
- Modify: `backend/src/services/synthesis.rs`
- [ ] **Step 1: Identify clone targets**
Read `backend/src/services/synthesis.rs`. In `run_generation_inner`, find values that are cloned into every spawned task but never mutated:
- `model_research` (String) — cloned ~4 times per batch iteration
- `classify_schema` (serde_json::Value) — cloned ~2 times per batch
- `classification_categories` (Vec<String>) — cloned ~2 times per batch
These should be wrapped in `Arc` at the point of creation, before the batch loops.
- [ ] **Step 2: Wrap immutable values in `Arc`**
At the point where these values are first created (before the batch loops), wrap them:
```rust
let model_research = Arc::new(model_research); // was String
let classify_schema = Arc::new(classify_schema); // was serde_json::Value
let classification_categories = Arc::new(classification_categories); // was Vec<String>
```
Add `use std::sync::Arc;` if not already imported (it likely is).
Then in the spawned tasks, change:
```rust
// Before:
let model = model_research.clone(); // clones String
// After:
let model = Arc::clone(&model_research); // clones Arc pointer
```
For `model` usage inside the task: `call_llm(&model, ...)` works because `Arc<String>` derefs to `&str`.
For `classify_schema`: `Arc<Value>` derefs to `&Value`. Check that `.clone()` calls in the task body that need an owned `Value` are updated if needed (the task may need `(*schema).clone()` if it passes owned values somewhere — read carefully).
For `classification_categories`: `Arc<Vec<String>>` derefs to `&[String]`. Update any place that calls `.clone()` on the inner vec to use `Arc::clone()` instead.
**Important:** Only change the variables that are cloned into spawned tasks. Don't change variables that are mutated (like `filled_counts`, `article_scraped`, `source_counts`).
- [ ] **Step 3: Build and test**
Run: `cd backend && cargo build && cargo test --lib`
Expected: All pass
- [ ] **Step 4: Commit**
```bash
git add backend/src/services/synthesis.rs
git commit -m "perf: use Arc for immutable values in pipeline to reduce cloning"
```
---
### Task 5: Align frontend default settings with backend
**Files:**
- Modify: `frontend/src/types.ts`
- [ ] **Step 1: Update `DEFAULT_SETTINGS`**
In `frontend/src/types.ts`, update `DEFAULT_SETTINGS` to match the backend's `Default for UserSettings` in `backend/src/models/settings.rs`:
```typescript
export const DEFAULT_SETTINGS: UserSettings = {
theme: 'Intelligence Artificielle',
max_age_days: 7,
max_items_per_category: 4,
max_articles_per_source: 3,
use_llm_for_source_links: false,
use_brave_search: false,
article_history_days: 90,
batch_size: 5,
search_agent_behavior: '', // backend default is empty string
ai_model: '',
ai_model_websearch: '',
ai_provider: '',
rate_limit_max_requests: null,
rate_limit_time_window_seconds: null,
categories: [
'Annonces majeures',
'Recherche et innovation',
'Industrie et entreprises',
'Secteur public',
'Opinions et analyses',
],
};
```
Key changes:
- `search_agent_behavior`: `"Tu peux..."``''` (backend defaults to empty)
- `categories`: update to match backend's 5 categories exactly
- [ ] **Step 2: TypeScript check**
Run: `cd frontend && npx tsc --noEmit`
Expected: No errors
- [ ] **Step 3: Commit**
```bash
git add frontend/src/types.ts
git commit -m "chore: align frontend DEFAULT_SETTINGS with backend defaults"
```
---
### Task 6: Standardize frontend data fetching on `createResource`
**Files:**
- Modify: Multiple frontend pages
- [ ] **Step 1: Identify pages to convert**
Read these pages and check which use `onMount` + `createSignal` for data loading:
- `frontend/src/pages/Home.tsx`
- `frontend/src/pages/ArticleHistory.tsx`
- `frontend/src/pages/SynthesisDetail.tsx`
- `frontend/src/pages/LlmLogs.tsx`
For each, the pattern to look for is:
```tsx
const [data, setData] = createSignal<T[]>([]);
const [loading, setLoading] = createSignal(true);
onMount(async () => {
try { setData(await api.fetch()); }
catch { setError(...); }
finally { setLoading(false); }
});
```
- [ ] **Step 2: Convert to `createResource`**
Replace with:
```tsx
const [data] = createResource(() => api.fetch());
```
Then use `data.loading` instead of `loading()`, `data()` instead of `data()` (same), and `data.error` for error state.
Only convert pages where it's a clean substitution. Skip pages with complex conditional fetching, SSE, or mutation-heavy patterns.
- [ ] **Step 3: TypeScript check**
Run: `cd frontend && npx tsc --noEmit`
Expected: No errors
- [ ] **Step 4: Commit**
```bash
git add frontend/src/pages/
git commit -m "refactor: standardize data fetching on createResource"
```
---
### Task 7: Use existing Button component
**Files:**
- Modify: Multiple frontend pages
- [ ] **Step 1: Read the Button component**
Read `frontend/src/components/ui/Button.tsx` to understand its API (props, variants, sizes).
- [ ] **Step 2: Find and replace obvious inline buttons**
Search for inline buttons with duplicated Tailwind classes across pages. Replace clear-cut cases where the inline button matches the `Button` component's API. Focus on primary action buttons (save, submit, generate) — don't force icon buttons or toggle buttons into the component.
- [ ] **Step 3: TypeScript check**
Run: `cd frontend && npx tsc --noEmit`
Expected: No errors
- [ ] **Step 4: Commit**
```bash
git add frontend/src/pages/ frontend/src/components/
git commit -m "refactor: use existing Button component to reduce inline Tailwind duplication"
```
---
### Task 8: Batch article history INSERTs
**Files:**
- Modify: `backend/src/db/article_history.rs`
- Modify: `backend/src/services/synthesis.rs`
- [ ] **Step 1: Add `batch_insert_entries` in `article_history.rs`**
Add a batch insert function using PostgreSQL `unnest()`:
```rust
/// Insert multiple article history entries in a single query.
pub async fn batch_insert_entries(pool: &PgPool, entries: &[ArticleHistoryEntry]) -> Result<(), AppError> {
if entries.is_empty() {
return Ok(());
}
let user_ids: Vec<Uuid> = entries.iter().map(|e| e.user_id).collect();
let urls: Vec<&str> = entries.iter().map(|e| e.url.as_str()).collect();
let url_hashes: Vec<&str> = entries.iter().map(|e| e.url_hash.as_str()).collect();
let titles: Vec<&str> = entries.iter().map(|e| e.title.as_str()).collect();
let source_types: Vec<&str> = entries.iter().map(|e| e.source_type.as_str()).collect();
let source_urls: Vec<Option<&str>> = entries.iter().map(|e| e.source_url.as_deref()).collect();
let categories: Vec<Option<&str>> = entries.iter().map(|e| e.category.as_deref()).collect();
let synthesis_ids: Vec<Option<Uuid>> = entries.iter().map(|e| e.synthesis_id).collect();
let statuses: Vec<&str> = entries.iter().map(|e| e.status.as_str()).collect();
let scraped_oks: Vec<bool> = entries.iter().map(|e| e.scraped_ok).collect();
let job_ids: Vec<Uuid> = entries.iter().map(|e| e.job_id).collect();
sqlx::query(
r#"
INSERT INTO article_history (user_id, url, url_hash, title, source_type, source_url, category, synthesis_id, status, scraped_ok, job_id)
SELECT * FROM unnest($1::uuid[], $2::text[], $3::text[], $4::text[], $5::text[], $6::text[], $7::text[], $8::uuid[], $9::text[], $10::bool[], $11::uuid[])
"#,
)
.bind(&user_ids)
.bind(&urls)
.bind(&url_hashes)
.bind(&titles)
.bind(&source_types)
.bind(&source_urls)
.bind(&categories)
.bind(&synthesis_ids)
.bind(&statuses)
.bind(&scraped_oks)
.bind(&job_ids)
.execute(pool)
.await?;
Ok(())
}
```
- [ ] **Step 2: Update `synthesis.rs` to batch traces**
In `backend/src/services/synthesis.rs`, update `trace_article` to collect entries instead of inserting immediately. The approach:
1. Change `trace_article` to build and return an `ArticleHistoryEntry` instead of inserting:
```rust
fn build_trace_entry(
user_id: Uuid,
job_id: Uuid,
trace: &ArticleTrace<'_>,
) -> db::article_history::ArticleHistoryEntry {
db::article_history::ArticleHistoryEntry {
user_id,
url: trace.url.to_string(),
url_hash: hash_article_url(trace.url),
title: trace.title.to_string(),
source_type: trace.source_type.to_string(),
source_url: trace.source_url.map(|s| s.to_string()),
category: trace.category.map(|s| s.to_string()),
synthesis_id: trace.synthesis_id,
status: trace.status.to_string(),
scraped_ok: trace.scraped_ok,
job_id,
}
}
```
2. In `run_generation_inner`, add a `Vec<ArticleHistoryEntry>` to collect traces
3. Replace `trace_article(...).await` calls with `pending_traces.push(build_trace_entry(...))`
4. Flush at key points (end of Phase 1, end of Phase 2, final save):
```rust
if !pending_traces.is_empty() {
db::article_history::batch_insert_entries(&state.pool, &pending_traces).await.ok();
pending_traces.clear();
}
```
**Important:** This is a large change touching many call sites. Read the file carefully. The `trace_article` calls inside spawned tasks (JoinSet) cannot easily collect into a shared Vec — for those, keep the individual inserts or collect results after the JoinSet completes.
- [ ] **Step 3: Build and test**
Run: `cd backend && cargo build && cargo test --lib`
Expected: All pass
- [ ] **Step 4: Commit**
```bash
git add backend/src/db/article_history.rs backend/src/services/synthesis.rs
git commit -m "perf: batch article history INSERTs to reduce DB round-trips"
```
Loading…
Cancel
Save