docs: add implementation plan for polish and optimization items

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 months ago · 2e87ae854a
parent 825c7212e1
commit 2e87ae854a
1 changed files with 415 additions and 0 deletions
--- a/docs/superpowers/plans/2026-03-26-polish-optimizations.md
+++ b/docs/superpowers/plans/2026-03-26-polish-optimizations.md
@ -0,0 +1,415 @@
 # Polish & Optimization — Implementation Plan
 > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
 **Goal:** 8 polish/optimization items from the code audit: batch INSERTs, Arc clones, LazyLock selectors, createResource standardization, Button component usage, default settings alignment, SESSION_SECRET removal, Arc for master key.
 **Architecture:** All tasks are independent — can be executed in any order. No behavioral changes. Pure optimization and cleanup.
 **Tech Stack:** Rust (sqlx, std::sync::LazyLock), SolidJS, TypeScript
 **Spec:** `docs/superpowers/specs/2026-03-26-polish-optimizations-design.md`
 ---
 ### Task 1: Remove unused `SESSION_SECRET`
 **Files:**
 - Modify: `backend/src/config.rs`
 - Modify: `.env.example`
 The simplest task — pure dead code removal.
 - [ ] **Step 1: Remove `session_secret` from `AppConfig`**
 In `backend/src/config.rs`:
 - Remove `pub session_secret: String` from the `AppConfig` struct
 - Remove `let session_secret = required_var("SESSION_SECRET")?;` from `from_env()`
 - Remove `session_secret,` from the struct literal in `from_env()`
 - Remove the `session_secret` length validation in `validate()`
 - Remove `session_secret: "a".repeat(64),` from all test fixtures
 - Remove the `test_validate_short_session_secret` test
 - [ ] **Step 2: Remove from `.env.example`**
 Remove the `SESSION_SECRET=` line from `.env.example`. Also remove any comment about it.
 - [ ] **Step 3: Build and test**
 Run: `cd backend && cargo build && cargo test --lib`
 Expected: All pass
 - [ ] **Step 4: Commit**
 ```bash
 git add backend/src/config.rs .env.example
 git commit -m "chore: remove unused SESSION_SECRET env var and config field"
 ```
 ---
 ### Task 2: Wrap `AppConfig` master key in `Arc`
 **Files:**
 - Modify: `backend/src/config.rs`
 - [ ] **Step 1: Change field type to `Arc<String>`**
 In `backend/src/config.rs`:
 - Add `use std::sync::Arc;` to imports
 - Change `pub master_encryption_key: String` to `pub master_encryption_key: Arc<String>`
 - In `from_env()`, wrap: `master_encryption_key: Arc::new(master_encryption_key),`
 - In test fixtures, wrap: `master_encryption_key: Arc::new("a".repeat(64)),`
 All call sites use `&state.config.master_encryption_key` which works with `Arc<String>` via `Deref`.
 - [ ] **Step 2: Build and test**
 Run: `cd backend && cargo build && cargo test --lib`
 Expected: All pass
 - [ ] **Step 3: Commit**
 ```bash
 git add backend/src/config.rs
 git commit -m "chore: wrap master_encryption_key in Arc to reduce secret copies"
 ```
 ---
 ### Task 3: Cache CSS selectors with `LazyLock`
 **Files:**
 - Modify: `backend/src/services/source_scraper.rs`
 - Modify: `backend/src/services/scraper.rs`
 - [ ] **Step 1: Fix `source_scraper.rs` selectors**
 In `backend/src/services/source_scraper.rs`, add at the top (after imports):
 ```rust
 use std::sync::LazyLock;
 static ANCHOR_SELECTOR: LazyLock<Selector> = LazyLock::new(|| Selector::parse("a[href]").unwrap());
 ```
 Then replace both `Selector::parse("a[href]").unwrap()` calls (lines 80 and 137) with `&ANCHOR_SELECTOR`.
 - [ ] **Step 2: Fix `scraper.rs` selectors**
 In `backend/src/services/scraper.rs`, there are 14 `Selector::parse` calls. Many use `if let Ok(sel) = Selector::parse(...)` which handles parse failure gracefully. For the static selectors with `.unwrap()` (like `"title"`, `"h1"`, `"body"`), convert to `LazyLock`. For selectors inside `if let Ok(...)`, leave as-is since the pattern already handles failure.
 Add at the top:
 ```rust
 use std::sync::LazyLock;
 static SEL_TITLE: LazyLock<Selector> = LazyLock::new(|| Selector::parse("title").unwrap());
 static SEL_H1: LazyLock<Selector> = LazyLock::new(|| Selector::parse("h1").unwrap());
 static SEL_BODY: LazyLock<Selector> = LazyLock::new(|| Selector::parse("body").unwrap());
 ```
 Read the file carefully. Only convert `Selector::parse` calls that use `.unwrap()` to `LazyLock`. For calls using `if let Ok(sel) = Selector::parse(...)`, leave them as-is — these are error-handled and the selectors may vary.
 - [ ] **Step 3: Build and test**
 Run: `cd backend && cargo build && cargo test --lib`
 Expected: All pass
 - [ ] **Step 4: Commit**
 ```bash
 git add backend/src/services/source_scraper.rs backend/src/services/scraper.rs
 git commit -m "perf: cache CSS selectors with LazyLock to avoid re-parsing"
 ```
 ---
 ### Task 4: Reduce `.clone()` in pipeline with `Arc`
 **Files:**
 - Modify: `backend/src/services/synthesis.rs`
 - [ ] **Step 1: Identify clone targets**
 Read `backend/src/services/synthesis.rs`. In `run_generation_inner`, find values that are cloned into every spawned task but never mutated:
 - `model_research` (String) — cloned ~4 times per batch iteration
 - `classify_schema` (serde_json::Value) — cloned ~2 times per batch
 - `classification_categories` (Vec<String>) — cloned ~2 times per batch
 These should be wrapped in `Arc` at the point of creation, before the batch loops.
 - [ ] **Step 2: Wrap immutable values in `Arc`**
 At the point where these values are first created (before the batch loops), wrap them:
 ```rust
 let model_research = Arc::new(model_research);  // was String
 let classify_schema = Arc::new(classify_schema);  // was serde_json::Value
 let classification_categories = Arc::new(classification_categories);  // was Vec<String>
 ```
 Add `use std::sync::Arc;` if not already imported (it likely is).
 Then in the spawned tasks, change:
 ```rust
 // Before:
 let model = model_research.clone();  // clones String
 // After:
 let model = Arc::clone(&model_research);  // clones Arc pointer
 ```
 For `model` usage inside the task: `call_llm(&model, ...)` works because `Arc<String>` derefs to `&str`.
 For `classify_schema`: `Arc<Value>` derefs to `&Value`. Check that `.clone()` calls in the task body that need an owned `Value` are updated if needed (the task may need `(*schema).clone()` if it passes owned values somewhere — read carefully).
 For `classification_categories`: `Arc<Vec<String>>` derefs to `&[String]`. Update any place that calls `.clone()` on the inner vec to use `Arc::clone()` instead.
 **Important:** Only change the variables that are cloned into spawned tasks. Don't change variables that are mutated (like `filled_counts`, `article_scraped`, `source_counts`).
 - [ ] **Step 3: Build and test**
 Run: `cd backend && cargo build && cargo test --lib`
 Expected: All pass
 - [ ] **Step 4: Commit**
 ```bash
 git add backend/src/services/synthesis.rs
 git commit -m "perf: use Arc for immutable values in pipeline to reduce cloning"
 ```
 ---
 ### Task 5: Align frontend default settings with backend
 **Files:**
 - Modify: `frontend/src/types.ts`
 - [ ] **Step 1: Update `DEFAULT_SETTINGS`**
 In `frontend/src/types.ts`, update `DEFAULT_SETTINGS` to match the backend's `Default for UserSettings` in `backend/src/models/settings.rs`:
 ```typescript
 export const DEFAULT_SETTINGS: UserSettings = {
  theme: 'Intelligence Artificielle',
  max_age_days: 7,
  max_items_per_category: 4,
  max_articles_per_source: 3,
  use_llm_for_source_links: false,
  use_brave_search: false,
  article_history_days: 90,
  batch_size: 5,
  search_agent_behavior: '',  // backend default is empty string
  ai_model: '',
  ai_model_websearch: '',
  ai_provider: '',
  rate_limit_max_requests: null,
  rate_limit_time_window_seconds: null,
  categories: [
    'Annonces majeures',
    'Recherche et innovation',
    'Industrie et entreprises',
    'Secteur public',
    'Opinions et analyses',
  ],
 };
 ```
 Key changes:
 - `search_agent_behavior`: `"Tu peux..."` → `''` (backend defaults to empty)
 - `categories`: update to match backend's 5 categories exactly
 - [ ] **Step 2: TypeScript check**
 Run: `cd frontend && npx tsc --noEmit`
 Expected: No errors
 - [ ] **Step 3: Commit**
 ```bash
 git add frontend/src/types.ts
 git commit -m "chore: align frontend DEFAULT_SETTINGS with backend defaults"
 ```
 ---
 ### Task 6: Standardize frontend data fetching on `createResource`
 **Files:**
 - Modify: Multiple frontend pages
 - [ ] **Step 1: Identify pages to convert**
 Read these pages and check which use `onMount` + `createSignal` for data loading:
 - `frontend/src/pages/Home.tsx`
 - `frontend/src/pages/ArticleHistory.tsx`
 - `frontend/src/pages/SynthesisDetail.tsx`
 - `frontend/src/pages/LlmLogs.tsx`
 For each, the pattern to look for is:
 ```tsx
 const [data, setData] = createSignal<T[]>([]);
 const [loading, setLoading] = createSignal(true);
 onMount(async () => {
  try { setData(await api.fetch()); }
  catch { setError(...); }
  finally { setLoading(false); }
 });
 ```
 - [ ] **Step 2: Convert to `createResource`**
 Replace with:
 ```tsx
 const [data] = createResource(() => api.fetch());
 ```
 Then use `data.loading` instead of `loading()`, `data()` instead of `data()` (same), and `data.error` for error state.
 Only convert pages where it's a clean substitution. Skip pages with complex conditional fetching, SSE, or mutation-heavy patterns.
 - [ ] **Step 3: TypeScript check**
 Run: `cd frontend && npx tsc --noEmit`
 Expected: No errors
 - [ ] **Step 4: Commit**
 ```bash
 git add frontend/src/pages/
 git commit -m "refactor: standardize data fetching on createResource"
 ```
 ---
 ### Task 7: Use existing Button component
 **Files:**
 - Modify: Multiple frontend pages
 - [ ] **Step 1: Read the Button component**
 Read `frontend/src/components/ui/Button.tsx` to understand its API (props, variants, sizes).
 - [ ] **Step 2: Find and replace obvious inline buttons**
 Search for inline buttons with duplicated Tailwind classes across pages. Replace clear-cut cases where the inline button matches the `Button` component's API. Focus on primary action buttons (save, submit, generate) — don't force icon buttons or toggle buttons into the component.
 - [ ] **Step 3: TypeScript check**
 Run: `cd frontend && npx tsc --noEmit`
 Expected: No errors
 - [ ] **Step 4: Commit**
 ```bash
 git add frontend/src/pages/ frontend/src/components/
 git commit -m "refactor: use existing Button component to reduce inline Tailwind duplication"
 ```
 ---
 ### Task 8: Batch article history INSERTs
 **Files:**
 - Modify: `backend/src/db/article_history.rs`
 - Modify: `backend/src/services/synthesis.rs`
 - [ ] **Step 1: Add `batch_insert_entries` in `article_history.rs`**
 Add a batch insert function using PostgreSQL `unnest()`:
 ```rust
 /// Insert multiple article history entries in a single query.
 pub async fn batch_insert_entries(pool: &PgPool, entries: &[ArticleHistoryEntry]) -> Result<(), AppError> {
    if entries.is_empty() {
        return Ok(());
    }
    let user_ids: Vec<Uuid> = entries.iter().map(|e| e.user_id).collect();
    let urls: Vec<&str> = entries.iter().map(|e| e.url.as_str()).collect();
    let url_hashes: Vec<&str> = entries.iter().map(|e| e.url_hash.as_str()).collect();
    let titles: Vec<&str> = entries.iter().map(|e| e.title.as_str()).collect();
    let source_types: Vec<&str> = entries.iter().map(|e| e.source_type.as_str()).collect();
    let source_urls: Vec<Option<&str>> = entries.iter().map(|e| e.source_url.as_deref()).collect();
    let categories: Vec<Option<&str>> = entries.iter().map(|e| e.category.as_deref()).collect();
    let synthesis_ids: Vec<Option<Uuid>> = entries.iter().map(|e| e.synthesis_id).collect();
    let statuses: Vec<&str> = entries.iter().map(|e| e.status.as_str()).collect();
    let scraped_oks: Vec<bool> = entries.iter().map(|e| e.scraped_ok).collect();
    let job_ids: Vec<Uuid> = entries.iter().map(|e| e.job_id).collect();
    sqlx::query(
        r#"
        INSERT INTO article_history (user_id, url, url_hash, title, source_type, source_url, category, synthesis_id, status, scraped_ok, job_id)
        SELECT * FROM unnest($1::uuid[], $2::text[], $3::text[], $4::text[], $5::text[], $6::text[], $7::text[], $8::uuid[], $9::text[], $10::bool[], $11::uuid[])
        "#,
    )
    .bind(&user_ids)
    .bind(&urls)
    .bind(&url_hashes)
    .bind(&titles)
    .bind(&source_types)
    .bind(&source_urls)
    .bind(&categories)
    .bind(&synthesis_ids)
    .bind(&statuses)
    .bind(&scraped_oks)
    .bind(&job_ids)
    .execute(pool)
    .await?;
    Ok(())
 }
 ```
 - [ ] **Step 2: Update `synthesis.rs` to batch traces**
 In `backend/src/services/synthesis.rs`, update `trace_article` to collect entries instead of inserting immediately. The approach:
 1. Change `trace_article` to build and return an `ArticleHistoryEntry` instead of inserting:
 ```rust
 fn build_trace_entry(
    user_id: Uuid,
    job_id: Uuid,
    trace: &ArticleTrace<'_>,
 ) -> db::article_history::ArticleHistoryEntry {
    db::article_history::ArticleHistoryEntry {
        user_id,
        url: trace.url.to_string(),
        url_hash: hash_article_url(trace.url),
        title: trace.title.to_string(),
        source_type: trace.source_type.to_string(),
        source_url: trace.source_url.map(|s| s.to_string()),
        category: trace.category.map(|s| s.to_string()),
        synthesis_id: trace.synthesis_id,
        status: trace.status.to_string(),
        scraped_ok: trace.scraped_ok,
        job_id,
    }
 }
 ```
 2. In `run_generation_inner`, add a `Vec<ArticleHistoryEntry>` to collect traces
 3. Replace `trace_article(...).await` calls with `pending_traces.push(build_trace_entry(...))`
 4. Flush at key points (end of Phase 1, end of Phase 2, final save):
 ```rust
 if !pending_traces.is_empty() {
    db::article_history::batch_insert_entries(&state.pool, &pending_traces).await.ok();
    pending_traces.clear();
 }
 ```
 **Important:** This is a large change touching many call sites. Read the file carefully. The `trace_article` calls inside spawned tasks (JoinSet) cannot easily collect into a shared Vec — for those, keep the individual inserts or collect results after the JoinSet completes.
 - [ ] **Step 3: Build and test**
 Run: `cd backend && cargo build && cargo test --lib`
 Expected: All pass
 - [ ] **Step 4: Commit**
 ```bash
 git add backend/src/db/article_history.rs backend/src/services/synthesis.rs
 git commit -m "perf: batch article history INSERTs to reduce DB round-trips"
 ```