You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
ai_synth/docs/superpowers/specs/2026-03-24-autre-fillup-des...

96 lines
4.2 KiB
Markdown

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

# Design: "Autre" Fill-Up to Reach 75% Synthesis Target
**Date**: 2026-03-24
**Scope**: When total article count is low, expand "Autre" category to bring synthesis to 75% of maximum capacity
---
## Context
After the two-phase pipeline (personalized sources + web search), article drops (history filtering, validation failures, scrape errors) can leave syntheses with few articles. The "Autre" category currently caps at `max_items_per_category` like other categories, meaning overflow articles are silently dropped even when the synthesis is under-filled.
## Approach
After both phases complete, check if the total article count is below 75% of the maximum. If so, expand "Autre" capacity and fill from overflow articles that were dropped during classification.
## Target Calculation
- Maximum articles = `categories.len() × max_items_per_category` (user categories only)
- Target = `(0.75 × maximum).ceil()` as `usize`
- Shortfall = `target.saturating_sub(current_total)` (saturating to avoid panic if total exceeds target)
- If shortfall > 0, add overflow articles to "Autre" up to the shortfall
**Why exclude "Autre" from the maximum:** The goal is to ensure user-defined categories are adequately filled. "Autre" is the overflow bucket — it should not inflate the target. If user categories are well-filled, no fill-up is needed even if "Autre" is empty.
Example: 4 categories × 4 items = 16 max. Target = 12. If user categories have 8 articles and "Autre" has 2 (total 10), shortfall = 2. "Autre" accepts 2 more overflow articles.
## Mechanism
### 1. Collect overflow during classification
Modify `parse_classification_response` to return a second value: `Vec<ScrapedNewsItem>` of overflow articles — articles that were dropped because both their target category AND "Autre" were full.
Current signature:
```rust
fn parse_classification_response(...) -> HashMap<String, Vec<ScrapedNewsItem>>
```
New signature:
```rust
fn parse_classification_response(...) -> (HashMap<String, Vec<ScrapedNewsItem>>, Vec<ScrapedNewsItem>)
```
### 2. Accumulate overflow across phases
In `run_generation_inner`, collect overflow from both Phase 1 and Phase 2 classification calls into a single `all_overflow: Vec<ScrapedNewsItem>`.
### 3. Post-classification fill-up
After both phases, before the rewrite pass:
```
const SYNTHESIS_MIN_FILL_RATIO: f64 = 0.75;
total = sum of all articles in all_scraped
max_articles = categories.len() * max_items_per_category
target = (SYNTHESIS_MIN_FILL_RATIO * max_articles as f64).ceil() as usize
shortfall = target.saturating_sub(total)
if shortfall > 0 and all_overflow is non-empty:
// Filter overflow against max_articles_per_source (source diversity)
for each overflow article:
count domain occurrences in all_scraped
skip if domain already at max_articles_per_source
take up to shortfall valid overflow articles
add them to all_scraped["category_autre"]
```
**Source diversity enforcement:** Overflow articles added back to "Autre" must respect the `max_articles_per_source` limit. Count existing domain occurrences across all categories in `all_scraped`, and only add an overflow article if its domain is still under the limit.
## Hardcoded 75%
The 75% target is hardcoded as a constant `SYNTHESIS_MIN_FILL_RATIO: f64 = 0.75` in `synthesis.rs`. No user setting needed — this is an internal quality threshold.
## Best-effort
If there aren't enough overflow articles to reach 75%, the synthesis proceeds with whatever it has. No error is raised.
## Files to Modify
- **Modify:** `backend/src/services/synthesis.rs`:
- Add `SYNTHESIS_MIN_FILL_RATIO` constant
- Modify `parse_classification_response` signature and body to collect and return overflow
- Update 2 production call sites to destructure the tuple
- Update 5 existing classification unit tests for new return type
- Add fill-up logic in `run_generation_inner` between Phase 2 and rewrite pass
- Add unit tests for overflow collection and fill-up calculation
## What Does NOT Change
- `max_items_per_category` — still the cap for user categories
- Classification prompt — unchanged
- Rewrite pass — sees the final article set including expanded "Autre"
- Frontend — no changes
- Database — no changes
- No new settings