From aebd436a91cddebd3483862583389f658f32e5ad Mon Sep 17 00:00:00 2001 From: oabrivard Date: Tue, 24 Mar 2026 16:56:41 +0100 Subject: [PATCH] docs: add spec for Autre fill-up to 75% synthesis target --- .../specs/2026-03-24-autre-fillup-design.md | 95 +++++++++++++++++++ 1 file changed, 95 insertions(+) create mode 100644 docs/superpowers/specs/2026-03-24-autre-fillup-design.md diff --git a/docs/superpowers/specs/2026-03-24-autre-fillup-design.md b/docs/superpowers/specs/2026-03-24-autre-fillup-design.md new file mode 100644 index 0000000..2e046e6 --- /dev/null +++ b/docs/superpowers/specs/2026-03-24-autre-fillup-design.md @@ -0,0 +1,95 @@ +# Design: "Autre" Fill-Up to Reach 75% Synthesis Target + +**Date**: 2026-03-24 +**Scope**: When total article count is low, expand "Autre" category to bring synthesis to 75% of maximum capacity + +--- + +## Context + +After the two-phase pipeline (personalized sources + web search), article drops (history filtering, validation failures, scrape errors) can leave syntheses with few articles. The "Autre" category currently caps at `max_items_per_category` like other categories, meaning overflow articles are silently dropped even when the synthesis is under-filled. + +## Approach + +After both phases complete, check if the total article count is below 75% of the maximum. If so, expand "Autre" capacity and fill from overflow articles that were dropped during classification. + +## Target Calculation + +- Maximum articles = `categories.len() × max_items_per_category` (user categories only) +- Target = `(0.75 × maximum).ceil()` as `usize` +- Shortfall = `target.saturating_sub(current_total)` (saturating to avoid panic if total exceeds target) +- If shortfall > 0, add overflow articles to "Autre" up to the shortfall + +**Why exclude "Autre" from the maximum:** The goal is to ensure user-defined categories are adequately filled. "Autre" is the overflow bucket — it should not inflate the target. If user categories are well-filled, no fill-up is needed even if "Autre" is empty. + +Example: 4 categories × 4 items = 16 max. Target = 12. If user categories have 8 articles and "Autre" has 2 (total 10), shortfall = 2. "Autre" accepts 2 more overflow articles. + +## Mechanism + +### 1. Collect overflow during classification + +Modify `parse_classification_response` to return a second value: `Vec` of overflow articles — articles that were dropped because both their target category AND "Autre" were full. + +Current signature: +```rust +fn parse_classification_response(...) -> HashMap> +``` + +New signature: +```rust +fn parse_classification_response(...) -> (HashMap>, Vec) +``` + +### 2. Accumulate overflow across phases + +In `run_generation_inner`, collect overflow from both Phase 1 and Phase 2 classification calls into a single `all_overflow: Vec`. + +### 3. Post-classification fill-up + +After both phases, before the rewrite pass: + +``` +const SYNTHESIS_MIN_FILL_RATIO: f64 = 0.75; + +total = sum of all articles in all_scraped +max_articles = categories.len() * max_items_per_category +target = (SYNTHESIS_MIN_FILL_RATIO * max_articles as f64).ceil() as usize +shortfall = target.saturating_sub(total) + +if shortfall > 0 and all_overflow is non-empty: + // Filter overflow against max_articles_per_source (source diversity) + for each overflow article: + count domain occurrences in all_scraped + skip if domain already at max_articles_per_source + take up to shortfall valid overflow articles + add them to all_scraped["category_autre"] +``` + +**Source diversity enforcement:** Overflow articles added back to "Autre" must respect the `max_articles_per_source` limit. Count existing domain occurrences across all categories in `all_scraped`, and only add an overflow article if its domain is still under the limit. + +## Hardcoded 75% + +The 75% target is hardcoded as a constant `SYNTHESIS_MIN_FILL_RATIO: f64 = 0.75` in `synthesis.rs`. No user setting needed — this is an internal quality threshold. + +## Best-effort + +If there aren't enough overflow articles to reach 75%, the synthesis proceeds with whatever it has. No error is raised. + +## Files to Modify + +- **Modify:** `backend/src/services/synthesis.rs`: + - Add `SYNTHESIS_MIN_FILL_RATIO` constant + - Modify `parse_classification_response` signature and body to collect and return overflow + - Update 2 production call sites to destructure the tuple + - Update 5 existing classification unit tests for new return type + - Add fill-up logic in `run_generation_inner` between Phase 2 and rewrite pass + - Add unit tests for overflow collection and fill-up calculation + +## What Does NOT Change + +- `max_items_per_category` — still the cap for user categories +- Classification prompt — unchanged +- Rewrite pass — sees the final article set including expanded "Autre" +- Frontend — no changes +- Database — no changes +- No new settings