docs: add spec for pipeline tweaks (parallel extraction, shuffle, clear history)
parent
48957470ed
commit
2d623c6ced
@ -0,0 +1,44 @@
|
|||||||
|
# Design: Pipeline Tweaks — Parallel Extraction, Shuffle, Clear History
|
||||||
|
|
||||||
|
**Date**: 2026-03-25
|
||||||
|
**Scope**: 4 focused improvements to the synthesis pipeline and article history UI
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Remove max source limit + extract 15 links
|
||||||
|
|
||||||
|
Currently `max_sources = rotated_sources.len().min(10)` limits to 10 sources, and `max_links = 10` limits to 10 links per source.
|
||||||
|
|
||||||
|
Change to:
|
||||||
|
- Process ALL user sources (no `.min(10)` cap)
|
||||||
|
- Extract 15 links per source (`max_links = 15`)
|
||||||
|
|
||||||
|
## 2. Parallel source extraction (concurrency 5)
|
||||||
|
|
||||||
|
Currently source pages are scraped sequentially in a `for` loop. Change to use `JoinSet` with max 5 concurrent extractions, same pattern as article scraping.
|
||||||
|
|
||||||
|
Both `extract_article_links` and `extract_article_links_with_llm` are async and return `Result<Vec<String>>`. The parallel loop spawns tasks and collects results.
|
||||||
|
|
||||||
|
## 3. Shuffle candidates after dedup/history filter
|
||||||
|
|
||||||
|
After deduplication and history filtering, before url→source tracking, shuffle `candidate_urls` using `rand::thread_rng()`. This ensures articles from different sources are interleaved rather than processed source-by-source.
|
||||||
|
|
||||||
|
The `rand` crate is already a dependency.
|
||||||
|
|
||||||
|
## 4. Clear history button
|
||||||
|
|
||||||
|
New API endpoint: `DELETE /api/v1/article-history` — deletes ALL article_history entries for the authenticated user.
|
||||||
|
|
||||||
|
New DB function: `delete_all_for_user(pool, user_id)`.
|
||||||
|
|
||||||
|
Frontend: "Effacer l'historique" button on the ArticleHistory page with a confirmation dialog.
|
||||||
|
|
||||||
|
## Files to Modify
|
||||||
|
|
||||||
|
- **Modify:** `backend/src/services/synthesis.rs` — remove source cap, change to 15 links, parallel extraction with JoinSet, add shuffle
|
||||||
|
- **Modify:** `backend/src/db/article_history.rs` — add `delete_all_for_user`
|
||||||
|
- **Create:** handler for DELETE endpoint (add to existing `article_history.rs` handler)
|
||||||
|
- **Modify:** `backend/src/router.rs` — add DELETE route
|
||||||
|
- **Modify:** `frontend/src/api/articleHistory.ts` — add `clearAll` method
|
||||||
|
- **Modify:** `frontend/src/pages/ArticleHistory.tsx` — add clear button with confirmation
|
||||||
|
- **Modify:** `frontend/src/i18n/fr.ts` — add labels
|
||||||
Loading…
Reference in New Issue