You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
ai_synth/docs/superpowers/specs/2026-03-24-autre-fillup-des...

4.2 KiB

Design: "Autre" Fill-Up to Reach 75% Synthesis Target

Date: 2026-03-24 Scope: When total article count is low, expand "Autre" category to bring synthesis to 75% of maximum capacity


Context

After the two-phase pipeline (personalized sources + web search), article drops (history filtering, validation failures, scrape errors) can leave syntheses with few articles. The "Autre" category currently caps at max_items_per_category like other categories, meaning overflow articles are silently dropped even when the synthesis is under-filled.

Approach

After both phases complete, check if the total article count is below 75% of the maximum. If so, expand "Autre" capacity and fill from overflow articles that were dropped during classification.

Target Calculation

  • Maximum articles = categories.len() × max_items_per_category (user categories only)
  • Target = (0.75 × maximum).ceil() as usize
  • Shortfall = target.saturating_sub(current_total) (saturating to avoid panic if total exceeds target)
  • If shortfall > 0, add overflow articles to "Autre" up to the shortfall

Why exclude "Autre" from the maximum: The goal is to ensure user-defined categories are adequately filled. "Autre" is the overflow bucket — it should not inflate the target. If user categories are well-filled, no fill-up is needed even if "Autre" is empty.

Example: 4 categories × 4 items = 16 max. Target = 12. If user categories have 8 articles and "Autre" has 2 (total 10), shortfall = 2. "Autre" accepts 2 more overflow articles.

Mechanism

1. Collect overflow during classification

Modify parse_classification_response to return a second value: Vec<ScrapedNewsItem> of overflow articles — articles that were dropped because both their target category AND "Autre" were full.

Current signature:

fn parse_classification_response(...) -> HashMap<String, Vec<ScrapedNewsItem>>

New signature:

fn parse_classification_response(...) -> (HashMap<String, Vec<ScrapedNewsItem>>, Vec<ScrapedNewsItem>)

2. Accumulate overflow across phases

In run_generation_inner, collect overflow from both Phase 1 and Phase 2 classification calls into a single all_overflow: Vec<ScrapedNewsItem>.

3. Post-classification fill-up

After both phases, before the rewrite pass:

const SYNTHESIS_MIN_FILL_RATIO: f64 = 0.75;

total = sum of all articles in all_scraped
max_articles = categories.len() * max_items_per_category
target = (SYNTHESIS_MIN_FILL_RATIO * max_articles as f64).ceil() as usize
shortfall = target.saturating_sub(total)

if shortfall > 0 and all_overflow is non-empty:
    // Filter overflow against max_articles_per_source (source diversity)
    for each overflow article:
        count domain occurrences in all_scraped
        skip if domain already at max_articles_per_source
    take up to shortfall valid overflow articles
    add them to all_scraped["category_autre"]

Source diversity enforcement: Overflow articles added back to "Autre" must respect the max_articles_per_source limit. Count existing domain occurrences across all categories in all_scraped, and only add an overflow article if its domain is still under the limit.

Hardcoded 75%

The 75% target is hardcoded as a constant SYNTHESIS_MIN_FILL_RATIO: f64 = 0.75 in synthesis.rs. No user setting needed — this is an internal quality threshold.

Best-effort

If there aren't enough overflow articles to reach 75%, the synthesis proceeds with whatever it has. No error is raised.

Files to Modify

  • Modify: backend/src/services/synthesis.rs:
    • Add SYNTHESIS_MIN_FILL_RATIO constant
    • Modify parse_classification_response signature and body to collect and return overflow
    • Update 2 production call sites to destructure the tuple
    • Update 5 existing classification unit tests for new return type
    • Add fill-up logic in run_generation_inner between Phase 2 and rewrite pass
    • Add unit tests for overflow collection and fill-up calculation

What Does NOT Change

  • max_items_per_category — still the cap for user categories
  • Classification prompt — unchanged
  • Rewrite pass — sees the final article set including expanded "Autre"
  • Frontend — no changes
  • Database — no changes
  • No new settings