diff --git a/docs/superpowers/plans/2026-04-03-rss-feed-integration.md b/docs/superpowers/plans/2026-04-03-rss-feed-integration.md
new file mode 100644
index 0000000..823c5ef
--- /dev/null
+++ b/docs/superpowers/plans/2026-04-03-rss-feed-integration.md
@@ -0,0 +1,1134 @@
+# RSS Feed Integration Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Add RSS/Atom feed support to personalized sources so the synthesis pipeline discovers articles via feeds first (sorted by recency), falling back to HTML extraction when no feed is found or it yields fewer than 3 links.
+
+**Architecture:** New `feed_parser` service handles feed discovery, parsing, and caching. The Phase 1 pipeline calls it before the existing `source_scraper`. Two new nullable columns on `sources` persist discovered feed URLs with a 30-day re-discovery cycle.
+
+**Tech Stack:** Rust, `feed-rs` crate (RSS/Atom/JSON Feed parsing), `scraper` crate (HTML `` discovery), `reqwest` (HTTP), `sqlx` (Postgres), `wiremock` (test mocks)
+
+---
+
+### Task 1: Add `feed-rs` dependency
+
+**Files:**
+- Modify: `backend/Cargo.toml`
+
+- [ ] **Step 1: Add feed-rs to dependencies**
+
+In `backend/Cargo.toml`, add after the `scraper` line in `[dependencies]`:
+
+```toml
+# RSS/Atom feed parsing
+feed-rs = "2"
+```
+
+- [ ] **Step 2: Verify it compiles**
+
+Run: `cd backend && cargo check`
+Expected: compiles with no errors
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add backend/Cargo.toml
+git commit -m "deps: add feed-rs crate for RSS/Atom feed parsing"
+```
+
+---
+
+### Task 2: Database migration — add RSS columns to sources
+
+**Files:**
+- Create: `backend/migrations/20260403000031_add_source_rss_fields.sql`
+- Modify: `backend/src/models/source.rs`
+- Modify: `backend/src/db/sources.rs`
+
+- [ ] **Step 1: Create the migration file**
+
+Create `backend/migrations/20260403000031_add_source_rss_fields.sql`:
+
+```sql
+ALTER TABLE sources ADD COLUMN rss_url TEXT;
+ALTER TABLE sources ADD COLUMN rss_discovered_at TIMESTAMPTZ;
+```
+
+- [ ] **Step 2: Add fields to the Source struct**
+
+In `backend/src/models/source.rs`, add two fields to the `Source` struct after `is_preferred`:
+
+```rust
+pub struct Source {
+ pub id: Uuid,
+ pub user_id: Uuid,
+ pub title: String,
+ pub url: String,
+ pub theme_id: Option,
+ pub is_preferred: bool,
+ pub rss_url: Option,
+ pub rss_discovered_at: Option>,
+ pub created_at: DateTime,
+}
+```
+
+- [ ] **Step 3: Update all SQL SELECT queries in `db/sources.rs`**
+
+Every `SELECT` in `backend/src/db/sources.rs` that uses `query_as::<_, Source>` needs the two new columns. Update the column lists from:
+
+```sql
+SELECT id, user_id, title, url, theme_id, is_preferred, created_at
+```
+
+to:
+
+```sql
+SELECT id, user_id, title, url, theme_id, is_preferred, rss_url, rss_discovered_at, created_at
+```
+
+This applies to:
+- `list_for_user` (2 queries: with and without theme_id filter)
+- `create` (the RETURNING clause)
+- `bulk_create` (the RETURNING clause)
+
+- [ ] **Step 4: Add `update_source_rss` function to `db/sources.rs`**
+
+Append to `backend/src/db/sources.rs`:
+
+```rust
+/// Update the cached RSS feed URL and discovery timestamp for a source.
+///
+/// Called during synthesis generation when a feed is discovered or re-verified.
+/// Pass `rss_url = None` to clear a previously cached feed (e.g., feed no longer exists).
+pub async fn update_source_rss(
+ pool: &PgPool,
+ source_id: Uuid,
+ rss_url: Option<&str>,
+ rss_discovered_at: Option>,
+) -> Result<(), AppError> {
+ sqlx::query(
+ "UPDATE sources SET rss_url = $1, rss_discovered_at = $2 WHERE id = $3",
+ )
+ .bind(rss_url)
+ .bind(rss_discovered_at)
+ .bind(source_id)
+ .execute(pool)
+ .await?;
+
+ Ok(())
+}
+```
+
+Add `use chrono::{DateTime, Utc};` to the imports at the top of `db/sources.rs`.
+
+- [ ] **Step 5: Verify it compiles**
+
+Run: `cd backend && cargo check`
+Expected: compiles (migration will run at startup)
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add backend/migrations/20260403000031_add_source_rss_fields.sql backend/src/models/source.rs backend/src/db/sources.rs
+git commit -m "feat: add rss_url and rss_discovered_at columns to sources"
+```
+
+---
+
+### Task 3: Create `feed_parser` service — `parse_feed` function
+
+**Files:**
+- Create: `backend/src/services/feed_parser.rs`
+- Modify: `backend/src/services/mod.rs`
+
+- [ ] **Step 1: Write the failing test for `parse_feed`**
+
+Create `backend/src/services/feed_parser.rs` with the test module and types:
+
+```rust
+//! RSS/Atom feed parser service.
+//!
+//! Discovers and parses RSS/Atom feeds from source URLs.
+//! Used in Phase 1 of the generation pipeline to extract article links
+//! sorted by publication date (newest first), before falling back
+//! to the HTML-based source_scraper.
+
+use chrono::{DateTime, Utc};
+use url::Url;
+
+use crate::errors::AppError;
+
+/// A single entry extracted from an RSS/Atom feed.
+#[derive(Debug, Clone)]
+pub struct FeedEntry {
+ pub url: String,
+ pub title: String,
+ pub published_date: Option>,
+}
+
+/// Result of attempting to detect and parse a feed for a source.
+pub enum FeedResult {
+ /// Feed found and parsed successfully.
+ Found {
+ feed_url: String,
+ entries: Vec,
+ },
+ /// No feed discovered or feed invalid.
+ NotFound,
+}
+
+/// Minimum number of feed entries to consider the feed useful.
+/// Below this threshold, the pipeline falls back to HTML extraction.
+pub const MIN_FEED_ENTRIES: usize = 3;
+
+/// Number of days before a cached feed URL is re-verified.
+pub const REDISCOVERY_DAYS: i64 = 30;
+
+/// Parse an RSS/Atom feed URL and return entries sorted by date (newest first).
+///
+/// Uses the `feed-rs` crate which handles RSS 1.0, RSS 2.0, Atom, and JSON Feed.
+/// Entries without a published date are placed last.
+pub async fn parse_feed(
+ http_client: &reqwest::Client,
+ feed_url: &str,
+ max_links: usize,
+) -> Result, AppError> {
+ let parsed_url = Url::parse(feed_url)
+ .map_err(|e| AppError::BadRequest(format!("Invalid feed URL: {}", e)))?;
+
+ if let Err(e) = crate::services::scraper::check_ssrf(&parsed_url).await {
+ tracing::warn!(url = feed_url, error = %e, "Feed URL failed SSRF check");
+ return Ok(Vec::new());
+ }
+
+ let response = http_client
+ .get(feed_url)
+ .send()
+ .await
+ .map_err(|e| {
+ tracing::warn!(url = feed_url, error = %e, "Failed to fetch feed");
+ AppError::Internal(anyhow::anyhow!("Failed to fetch feed"))
+ })?;
+
+ if !response.status().is_success() {
+ tracing::warn!(url = feed_url, status = %response.status(), "Feed returned non-200");
+ return Ok(Vec::new());
+ }
+
+ let body = response.bytes().await.map_err(|e| {
+ AppError::Internal(anyhow::anyhow!("Failed to read feed body: {}", e))
+ })?;
+
+ let feed = feed_rs::parser::parse(&body[..]).map_err(|e| {
+ tracing::warn!(url = feed_url, error = %e, "Failed to parse feed");
+ AppError::Internal(anyhow::anyhow!("Failed to parse feed: {}", e))
+ })?;
+
+ let mut entries: Vec = feed
+ .entries
+ .into_iter()
+ .filter_map(|entry| {
+ // Get the article URL: prefer links, fall back to id if it looks like a URL
+ let url = entry
+ .links
+ .first()
+ .map(|l| l.href.clone())
+ .or_else(|| {
+ if entry.id.starts_with("http://") || entry.id.starts_with("https://") {
+ Some(entry.id.clone())
+ } else {
+ None
+ }
+ })?;
+
+ let title = entry
+ .title
+ .map(|t| t.content)
+ .unwrap_or_default();
+
+ let published_date = entry
+ .published
+ .or(entry.updated);
+
+ Some(FeedEntry {
+ url,
+ title,
+ published_date,
+ })
+ })
+ .collect();
+
+ // Sort by published_date descending (newest first), entries without dates last
+ entries.sort_by(|a, b| {
+ match (&b.published_date, &a.published_date) {
+ (Some(db), Some(da)) => db.cmp(da),
+ (Some(_), None) => std::cmp::Ordering::Less,
+ (None, Some(_)) => std::cmp::Ordering::Greater,
+ (None, None) => std::cmp::Ordering::Equal,
+ }
+ });
+
+ entries.truncate(max_links);
+
+ Ok(entries)
+}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+ use wiremock::{Mock, MockServer, ResponseTemplate};
+ use wiremock::matchers::method;
+
+ #[tokio::test]
+ async fn parse_feed_rss2() {
+ let server = MockServer::start().await;
+ let rss_body = r#"
+
+
+ Test Blog
+ -
+ Article 1
+ https://example.com/article-1
+ Thu, 03 Apr 2026 10:00:00 GMT
+
+ -
+ Article 2
+ https://example.com/article-2
+ Wed, 02 Apr 2026 10:00:00 GMT
+
+ -
+ Article 3
+ https://example.com/article-3
+ Tue, 01 Apr 2026 10:00:00 GMT
+
+
+"#;
+
+ Mock::given(method("GET"))
+ .respond_with(ResponseTemplate::new(200).set_body_raw(rss_body, "application/rss+xml"))
+ .mount(&server)
+ .await;
+
+ let client = reqwest::Client::new();
+ let entries = parse_feed(&client, &server.uri(), 10).await.unwrap();
+
+ assert_eq!(entries.len(), 3);
+ assert_eq!(entries[0].title, "Article 1");
+ assert_eq!(entries[0].url, "https://example.com/article-1");
+ assert!(entries[0].published_date > entries[1].published_date);
+ assert!(entries[1].published_date > entries[2].published_date);
+ }
+
+ #[tokio::test]
+ async fn parse_feed_atom() {
+ let server = MockServer::start().await;
+ let atom_body = r#"
+
+ Test Feed
+
+ Atom Article
+
+ 2026-04-03T12:00:00Z
+
+"#;
+
+ Mock::given(method("GET"))
+ .respond_with(ResponseTemplate::new(200).set_body_raw(atom_body, "application/atom+xml"))
+ .mount(&server)
+ .await;
+
+ let client = reqwest::Client::new();
+ let entries = parse_feed(&client, &server.uri(), 10).await.unwrap();
+
+ assert_eq!(entries.len(), 1);
+ assert_eq!(entries[0].title, "Atom Article");
+ assert_eq!(entries[0].url, "https://example.com/atom-1");
+ assert!(entries[0].published_date.is_some());
+ }
+
+ #[tokio::test]
+ async fn parse_feed_respects_max_links() {
+ let server = MockServer::start().await;
+ let rss_body = r#"
+
+
+ Test
+ - A1https://example.com/1Thu, 03 Apr 2026 10:00:00 GMT
+ - A2https://example.com/2Wed, 02 Apr 2026 10:00:00 GMT
+ - A3https://example.com/3Tue, 01 Apr 2026 10:00:00 GMT
+
+"#;
+
+ Mock::given(method("GET"))
+ .respond_with(ResponseTemplate::new(200).set_body_raw(rss_body, "application/rss+xml"))
+ .mount(&server)
+ .await;
+
+ let client = reqwest::Client::new();
+ let entries = parse_feed(&client, &server.uri(), 2).await.unwrap();
+
+ assert_eq!(entries.len(), 2);
+ assert_eq!(entries[0].url, "https://example.com/1"); // newest first
+ }
+
+ #[tokio::test]
+ async fn parse_feed_entries_without_dates_come_last() {
+ let server = MockServer::start().await;
+ let rss_body = r#"
+
+
+ Test
+ - No datehttps://example.com/no-date
+ - Has datehttps://example.com/has-dateThu, 03 Apr 2026 10:00:00 GMT
+
+"#;
+
+ Mock::given(method("GET"))
+ .respond_with(ResponseTemplate::new(200).set_body_raw(rss_body, "application/rss+xml"))
+ .mount(&server)
+ .await;
+
+ let client = reqwest::Client::new();
+ let entries = parse_feed(&client, &server.uri(), 10).await.unwrap();
+
+ assert_eq!(entries.len(), 2);
+ assert_eq!(entries[0].url, "https://example.com/has-date");
+ assert_eq!(entries[1].url, "https://example.com/no-date");
+ }
+
+ #[tokio::test]
+ async fn parse_feed_404_returns_empty() {
+ let server = MockServer::start().await;
+
+ Mock::given(method("GET"))
+ .respond_with(ResponseTemplate::new(404))
+ .mount(&server)
+ .await;
+
+ let client = reqwest::Client::new();
+ let entries = parse_feed(&client, &server.uri(), 10).await.unwrap();
+ assert!(entries.is_empty());
+ }
+
+ #[tokio::test]
+ async fn parse_feed_invalid_xml_returns_error() {
+ let server = MockServer::start().await;
+
+ Mock::given(method("GET"))
+ .respond_with(ResponseTemplate::new(200).set_body_string("not xml at all"))
+ .mount(&server)
+ .await;
+
+ let client = reqwest::Client::new();
+ let result = parse_feed(&client, &server.uri(), 10).await;
+ assert!(result.is_err());
+ }
+}
+```
+
+- [ ] **Step 2: Register the module in `services/mod.rs`**
+
+In `backend/src/services/mod.rs`, add after the `export` line:
+
+```rust
+pub mod feed_parser;
+```
+
+- [ ] **Step 3: Run tests to verify they pass**
+
+Run: `cd backend && cargo test --lib feed_parser -- --nocapture`
+Expected: all 6 tests pass
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add backend/src/services/feed_parser.rs backend/src/services/mod.rs
+git commit -m "feat: add feed_parser service with parse_feed function and tests"
+```
+
+---
+
+### Task 4: Add `discover_feed` function
+
+**Files:**
+- Modify: `backend/src/services/feed_parser.rs`
+
+- [ ] **Step 1: Write failing tests for `discover_feed`**
+
+Add these tests to the `mod tests` block in `backend/src/services/feed_parser.rs`:
+
+```rust
+ #[tokio::test]
+ async fn discover_feed_from_link_rss() {
+ let server = MockServer::start().await;
+ let html = format!(
+ r#"
+
+ "#,
+ server.uri()
+ );
+
+ Mock::given(method("GET"))
+ .respond_with(ResponseTemplate::new(200).set_body_string(html))
+ .mount(&server)
+ .await;
+
+ let client = reqwest::Client::new();
+ let result = discover_feed(&client, &server.uri()).await;
+
+ assert!(result.is_some());
+ assert!(result.unwrap().contains("/feed.xml"));
+ }
+
+ #[tokio::test]
+ async fn discover_feed_from_link_atom() {
+ let server = MockServer::start().await;
+ let html = format!(
+ r#"
+
+ "#,
+ server.uri()
+ );
+
+ Mock::given(method("GET"))
+ .respond_with(ResponseTemplate::new(200).set_body_string(html))
+ .mount(&server)
+ .await;
+
+ let client = reqwest::Client::new();
+ let result = discover_feed(&client, &server.uri()).await;
+
+ assert!(result.is_some());
+ assert!(result.unwrap().contains("/atom.xml"));
+ }
+
+ #[tokio::test]
+ async fn discover_feed_direct_rss_url() {
+ let server = MockServer::start().await;
+ let rss_body = r#"T"#;
+
+ Mock::given(method("GET"))
+ .respond_with(
+ ResponseTemplate::new(200)
+ .set_body_raw(rss_body, "application/rss+xml")
+ )
+ .mount(&server)
+ .await;
+
+ let client = reqwest::Client::new();
+ let result = discover_feed(&client, &server.uri()).await;
+
+ assert!(result.is_some());
+ assert_eq!(result.unwrap(), server.uri());
+ }
+
+ #[tokio::test]
+ async fn discover_feed_no_feed_found() {
+ let server = MockServer::start().await;
+ let html = "No feed";
+
+ Mock::given(method("GET"))
+ .respond_with(ResponseTemplate::new(200).set_body_string(html))
+ .mount(&server)
+ .await;
+
+ let client = reqwest::Client::new();
+ let result = discover_feed(&client, &server.uri()).await;
+
+ assert!(result.is_none());
+ }
+
+ #[tokio::test]
+ async fn discover_feed_resolves_relative_href() {
+ let server = MockServer::start().await;
+ let html = r#"
+
+ "#;
+
+ Mock::given(method("GET"))
+ .respond_with(ResponseTemplate::new(200).set_body_string(html))
+ .mount(&server)
+ .await;
+
+ let client = reqwest::Client::new();
+ let result = discover_feed(&client, &server.uri()).await;
+
+ assert!(result.is_some());
+ let feed_url = result.unwrap();
+ assert!(feed_url.starts_with(&server.uri()));
+ assert!(feed_url.ends_with("/feed.xml"));
+ }
+```
+
+- [ ] **Step 2: Run tests to verify they fail**
+
+Run: `cd backend && cargo test --lib feed_parser::tests::discover_feed -- --nocapture`
+Expected: compilation error — `discover_feed` not defined
+
+- [ ] **Step 3: Implement `discover_feed`**
+
+Add this function to `backend/src/services/feed_parser.rs`, before the `#[cfg(test)]` block:
+
+```rust
+/// RSS/Atom content types that indicate a direct feed URL.
+const FEED_CONTENT_TYPES: &[&str] = &[
+ "application/rss+xml",
+ "application/atom+xml",
+ "application/xml",
+ "text/xml",
+];
+
+/// Discover an RSS/Atom feed URL from a source URL.
+///
+/// Two detection strategies:
+/// 1. If the URL itself returns an RSS/Atom Content-Type, it is a feed directly.
+/// 2. If the URL returns HTML, look for ``
+/// or `type="application/atom+xml"` in the ``.
+///
+/// Returns `Some(feed_url)` if a feed is found, `None` otherwise.
+pub async fn discover_feed(
+ http_client: &reqwest::Client,
+ source_url: &str,
+) -> Option {
+ let parsed_url = Url::parse(source_url).ok()?;
+
+ if let Err(e) = crate::services::scraper::check_ssrf(&parsed_url).await {
+ tracing::warn!(url = source_url, error = %e, "Source URL failed SSRF check during feed discovery");
+ return None;
+ }
+
+ let response = http_client
+ .get(source_url)
+ .send()
+ .await
+ .ok()?;
+
+ if !response.status().is_success() {
+ return None;
+ }
+
+ // Check Content-Type for direct feed
+ let content_type = response
+ .headers()
+ .get(reqwest::header::CONTENT_TYPE)
+ .and_then(|v| v.to_str().ok())
+ .unwrap_or("")
+ .to_lowercase();
+
+ if FEED_CONTENT_TYPES.iter().any(|ct| content_type.contains(ct)) {
+ return Some(source_url.to_string());
+ }
+
+ // If HTML, look for with feed type
+ if !content_type.contains("text/html") {
+ return None;
+ }
+
+ let body = response.text().await.ok()?;
+ let document = scraper::Html::parse_document(&body);
+
+ let selector = scraper::Selector::parse(r#"link[rel="alternate"]"#).ok()?;
+
+ for element in document.select(&selector) {
+ let link_type = element.value().attr("type").unwrap_or("");
+ if link_type == "application/rss+xml" || link_type == "application/atom+xml" {
+ if let Some(href) = element.value().attr("href") {
+ // Resolve relative URLs against the source URL
+ let resolved = parsed_url.join(href).ok()?;
+ return Some(resolved.to_string());
+ }
+ }
+ }
+
+ None
+}
+```
+
+- [ ] **Step 4: Run tests to verify they pass**
+
+Run: `cd backend && cargo test --lib feed_parser -- --nocapture`
+Expected: all 11 tests pass (6 from Task 3 + 5 new)
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add backend/src/services/feed_parser.rs
+git commit -m "feat: add discover_feed function for RSS/Atom auto-discovery"
+```
+
+---
+
+### Task 5: Add `detect_and_parse_feed` orchestration function
+
+**Files:**
+- Modify: `backend/src/services/feed_parser.rs`
+
+- [ ] **Step 1: Write failing tests for `detect_and_parse_feed`**
+
+Add these tests to the `mod tests` block in `backend/src/services/feed_parser.rs`:
+
+```rust
+ #[tokio::test]
+ async fn detect_and_parse_cached_fresh_feed() {
+ let server = MockServer::start().await;
+ let rss_body = r#"
+T
+ - A1https://example.com/1Thu, 03 Apr 2026 10:00:00 GMT
+ - A2https://example.com/2Wed, 02 Apr 2026 10:00:00 GMT
+ - A3https://example.com/3Tue, 01 Apr 2026 10:00:00 GMT
+"#;
+
+ Mock::given(method("GET"))
+ .respond_with(ResponseTemplate::new(200).set_body_raw(rss_body, "application/rss+xml"))
+ .mount(&server)
+ .await;
+
+ let client = reqwest::Client::new();
+ let result = detect_and_parse_feed(
+ &client,
+ "https://example.com",
+ Some(&server.uri()),
+ Some(Utc::now()), // fresh
+ 10,
+ ).await;
+
+ match result {
+ FeedResult::Found { entries, .. } => assert_eq!(entries.len(), 3),
+ FeedResult::NotFound => panic!("Expected Found"),
+ }
+ }
+
+ #[tokio::test]
+ async fn detect_and_parse_no_cache_discovers_feed() {
+ let server = MockServer::start().await;
+
+ // First request: HTML page with feed link
+ let feed_path = format!("{}/feed.xml", server.uri());
+ let html = format!(
+ r#"
+
+ "#,
+ feed_path
+ );
+
+ let rss_body = r#"
+T
+ - A1https://example.com/1Thu, 03 Apr 2026 10:00:00 GMT
+ - A2https://example.com/2Wed, 02 Apr 2026 10:00:00 GMT
+ - A3https://example.com/3Tue, 01 Apr 2026 10:00:00 GMT
+"#;
+
+ // Mock: source page returns HTML
+ Mock::given(method("GET"))
+ .and(wiremock::matchers::path("/"))
+ .respond_with(ResponseTemplate::new(200).set_body_string(html))
+ .mount(&server)
+ .await;
+
+ // Mock: feed URL returns RSS
+ Mock::given(method("GET"))
+ .and(wiremock::matchers::path("/feed.xml"))
+ .respond_with(ResponseTemplate::new(200).set_body_raw(rss_body, "application/rss+xml"))
+ .mount(&server)
+ .await;
+
+ let client = reqwest::Client::new();
+ let result = detect_and_parse_feed(
+ &client,
+ &server.uri(),
+ None, // no cache
+ None,
+ 10,
+ ).await;
+
+ match result {
+ FeedResult::Found { feed_url, entries } => {
+ assert!(feed_url.contains("/feed.xml"));
+ assert_eq!(entries.len(), 3);
+ }
+ FeedResult::NotFound => panic!("Expected Found"),
+ }
+ }
+
+ #[tokio::test]
+ async fn detect_and_parse_no_feed_returns_not_found() {
+ let server = MockServer::start().await;
+ let html = "No feed";
+
+ Mock::given(method("GET"))
+ .respond_with(ResponseTemplate::new(200).set_body_string(html))
+ .mount(&server)
+ .await;
+
+ let client = reqwest::Client::new();
+ let result = detect_and_parse_feed(
+ &client,
+ &server.uri(),
+ None,
+ None,
+ 10,
+ ).await;
+
+ assert!(matches!(result, FeedResult::NotFound));
+ }
+
+ #[tokio::test]
+ async fn detect_and_parse_stale_cache_rediscovers() {
+ let server = MockServer::start().await;
+
+ let feed_path = format!("{}/feed.xml", server.uri());
+ let html = format!(
+ r#"
+
+ "#,
+ feed_path
+ );
+
+ let rss_body = r#"
+T
+ - A1https://example.com/1Thu, 03 Apr 2026 10:00:00 GMT
+ - A2https://example.com/2Wed, 02 Apr 2026 10:00:00 GMT
+ - A3https://example.com/3Tue, 01 Apr 2026 10:00:00 GMT
+"#;
+
+ Mock::given(method("GET"))
+ .and(wiremock::matchers::path("/"))
+ .respond_with(ResponseTemplate::new(200).set_body_string(html))
+ .mount(&server)
+ .await;
+
+ Mock::given(method("GET"))
+ .and(wiremock::matchers::path("/feed.xml"))
+ .respond_with(ResponseTemplate::new(200).set_body_raw(rss_body, "application/rss+xml"))
+ .mount(&server)
+ .await;
+
+ let client = reqwest::Client::new();
+ let stale_date = Utc::now() - chrono::Duration::days(31);
+ let result = detect_and_parse_feed(
+ &client,
+ &server.uri(),
+ Some("https://old-feed.example.com/rss"), // stale cached URL
+ Some(stale_date),
+ 10,
+ ).await;
+
+ match result {
+ FeedResult::Found { feed_url, entries } => {
+ assert!(feed_url.contains("/feed.xml"), "Should discover new feed URL");
+ assert_eq!(entries.len(), 3);
+ }
+ FeedResult::NotFound => panic!("Expected Found after re-discovery"),
+ }
+ }
+```
+
+- [ ] **Step 2: Run tests to verify they fail**
+
+Run: `cd backend && cargo test --lib feed_parser::tests::detect_and_parse -- --nocapture`
+Expected: compilation error — `detect_and_parse_feed` not defined
+
+- [ ] **Step 3: Implement `detect_and_parse_feed`**
+
+Add this function to `backend/src/services/feed_parser.rs`, before the `#[cfg(test)]` block:
+
+```rust
+/// Detect and parse an RSS/Atom feed for a source URL.
+///
+/// Orchestrates the discovery and parsing logic:
+/// - If `rss_url` is cached and fresh (< 30 days), parse it directly.
+/// - If `rss_url` is cached but stale (>= 30 days), re-discover from `source_url`.
+/// - If no `rss_url` cached, attempt discovery from `source_url`.
+///
+/// Returns `FeedResult::Found` with the feed URL and sorted entries,
+/// or `FeedResult::NotFound` if no feed could be found/parsed.
+pub async fn detect_and_parse_feed(
+ http_client: &reqwest::Client,
+ source_url: &str,
+ rss_url: Option<&str>,
+ rss_discovered_at: Option>,
+ max_links: usize,
+) -> FeedResult {
+ // Case 1: Cached and fresh — use directly
+ if let Some(cached_url) = rss_url {
+ let is_fresh = rss_discovered_at
+ .map(|d| Utc::now().signed_duration_since(d).num_days() < REDISCOVERY_DAYS)
+ .unwrap_or(false);
+
+ if is_fresh {
+ match parse_feed(http_client, cached_url, max_links).await {
+ Ok(entries) if !entries.is_empty() => {
+ return FeedResult::Found {
+ feed_url: cached_url.to_string(),
+ entries,
+ };
+ }
+ _ => {
+ tracing::warn!(url = cached_url, "Cached feed failed to parse, attempting re-discovery");
+ }
+ }
+ }
+ }
+
+ // Case 2: No cache or stale — discover
+ let discovered = discover_feed(http_client, source_url).await;
+
+ if let Some(feed_url) = discovered {
+ match parse_feed(http_client, &feed_url, max_links).await {
+ Ok(entries) if !entries.is_empty() => {
+ return FeedResult::Found {
+ feed_url,
+ entries,
+ };
+ }
+ Ok(_) => {
+ tracing::info!(url = feed_url, "Discovered feed is empty");
+ }
+ Err(e) => {
+ tracing::warn!(url = feed_url, error = %e, "Discovered feed failed to parse");
+ }
+ }
+ }
+
+ FeedResult::NotFound
+}
+```
+
+- [ ] **Step 4: Run tests to verify they pass**
+
+Run: `cd backend && cargo test --lib feed_parser -- --nocapture`
+Expected: all 15 tests pass
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add backend/src/services/feed_parser.rs
+git commit -m "feat: add detect_and_parse_feed orchestration function"
+```
+
+---
+
+### Task 6: Integrate feed_parser into the Phase 1 pipeline
+
+**Files:**
+- Modify: `backend/src/services/synthesis/mod.rs`
+
+- [ ] **Step 1: Add `feed_parser` import**
+
+In `backend/src/services/synthesis/mod.rs`, add after line 29 (`use crate::services::source_scraper;`):
+
+```rust
+use crate::services::feed_parser;
+```
+
+- [ ] **Step 2: Replace the link extraction in Phase 1 wave processing**
+
+In `backend/src/services/synthesis/mod.rs`, locate the Phase 1 link extraction block (around line 193-224). This is inside the `'wave_loop` where `join_set` spawns tasks calling `source_scraper::extract_article_links`.
+
+Replace the entire block from `let mut wave_urls: Vec<(String, String)> = Vec::new();` through the closing `}` of `while let Some(join_result) = join_set.join_next().await { ... }` (lines 193-224) with:
+
+```rust
+ let mut wave_urls: Vec<(String, String)> = Vec::new();
+ let mut rss_updates: Vec<(Uuid, Option, Option>)> = Vec::new();
+ {
+ let mut join_set = tokio::task::JoinSet::new();
+ for source in wave_sources {
+ let client = state.http_client.clone();
+ let source_id = source.id;
+ let source_url = source.url.clone();
+ let source_title = source.title.clone();
+ let rss_url = source.rss_url.clone();
+ let rss_discovered_at = source.rss_discovered_at;
+ let max_l = max_links;
+ join_set.spawn(async move {
+ // Try RSS feed first
+ let feed_result = feed_parser::detect_and_parse_feed(
+ &client,
+ &source_url,
+ rss_url.as_deref(),
+ rss_discovered_at,
+ max_l,
+ ).await;
+
+ match feed_result {
+ feed_parser::FeedResult::Found { feed_url, entries }
+ if entries.len() >= feed_parser::MIN_FEED_ENTRIES =>
+ {
+ let links: Vec = entries.into_iter().map(|e| e.url).collect();
+ tracing::info!(
+ source = %source_title,
+ feed = %feed_url,
+ links = links.len(),
+ "Extracted links from RSS feed"
+ );
+ // Signal RSS URL update if it changed
+ let rss_changed = rss_url.as_deref() != Some(&feed_url);
+ let rss_stale = rss_discovered_at
+ .map(|d| Utc::now().signed_duration_since(d).num_days() >= feed_parser::REDISCOVERY_DAYS)
+ .unwrap_or(true);
+ let update = if rss_changed || rss_stale {
+ Some((source_id, Some(feed_url), Some(Utc::now())))
+ } else {
+ None
+ };
+ (source_url, source_title, Ok(links), update)
+ }
+ _ => {
+ // Fallback to HTML extraction
+ let links = source_scraper::extract_article_links(&client, &source_url, max_l).await;
+ // If we had a cached RSS URL but feed failed, clear it
+ let update = if rss_url.is_some() {
+ Some((source_id, None, None))
+ } else {
+ None
+ };
+ (source_url, source_title, links, update)
+ }
+ }
+ });
+ }
+
+ while let Some(join_result) = join_set.join_next().await {
+ if let Ok((source_url, source_title, links_result, rss_update)) = join_result {
+ if let Some(update) = rss_update {
+ rss_updates.push(update);
+ }
+ match links_result {
+ Ok(links) => {
+ tracing::info!(source = %source_title, links = links.len(), "Extracted links from source");
+ for link in links {
+ if seen_urls.insert(link.to_lowercase()) {
+ wave_urls.push((link, source_url.clone()));
+ }
+ }
+ }
+ Err(e) => {
+ tracing::warn!(source = %source_title, error = %e, "Failed to extract links");
+ }
+ }
+ }
+ }
+ }
+
+ // Persist RSS URL updates (fire-and-forget)
+ for (source_id, new_rss_url, new_discovered_at) in rss_updates {
+ db::sources::update_source_rss(
+ &state.pool,
+ source_id,
+ new_rss_url.as_deref(),
+ new_discovered_at,
+ ).await.ok();
+ }
+```
+
+- [ ] **Step 3: Verify it compiles**
+
+Run: `cd backend && cargo check`
+Expected: compiles with no errors
+
+- [ ] **Step 4: Run existing unit tests**
+
+Run: `cd backend && cargo test --lib`
+Expected: all tests pass (no regressions)
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add backend/src/services/synthesis/mod.rs
+git commit -m "feat: integrate feed_parser into Phase 1 pipeline with HTML fallback"
+```
+
+---
+
+### Task 7: Add integration test for RSS feed in pipeline
+
+**Files:**
+- Modify: the existing integration test structure (if a synthesis integration test exists), OR create a focused unit test
+
+- [ ] **Step 1: Write a test that verifies RSS-first behavior end-to-end**
+
+Add this test to the `mod tests` block at the end of `backend/src/services/feed_parser.rs`:
+
+```rust
+ #[tokio::test]
+ async fn full_flow_rss_first_with_html_fallback() {
+ // Source 1: has an RSS feed with 5 articles
+ let server1 = MockServer::start().await;
+ let rss_body = r#"
+Blog
+ - A1https://blog.example.com/1Thu, 03 Apr 2026 10:00:00 GMT
+ - A2https://blog.example.com/2Wed, 02 Apr 2026 10:00:00 GMT
+ - A3https://blog.example.com/3Tue, 01 Apr 2026 10:00:00 GMT
+ - A4https://blog.example.com/4Mon, 31 Mar 2026 10:00:00 GMT
+ - A5https://blog.example.com/5Sun, 30 Mar 2026 10:00:00 GMT
+"#;
+
+ Mock::given(method("GET"))
+ .respond_with(ResponseTemplate::new(200).set_body_raw(rss_body, "application/rss+xml"))
+ .mount(&server1)
+ .await;
+
+ let client = reqwest::Client::new();
+
+ // With cached RSS URL (fresh) — should use RSS directly
+ let result = detect_and_parse_feed(
+ &client,
+ "https://blog.example.com",
+ Some(&server1.uri()),
+ Some(Utc::now()),
+ 10,
+ ).await;
+
+ match result {
+ FeedResult::Found { entries, .. } => {
+ assert_eq!(entries.len(), 5);
+ // Verify sorted newest first
+ for i in 0..entries.len() - 1 {
+ if let (Some(a), Some(b)) = (&entries[i].published_date, &entries[i + 1].published_date) {
+ assert!(a >= b, "Entries should be sorted newest first");
+ }
+ }
+ }
+ FeedResult::NotFound => panic!("Expected Found"),
+ }
+
+ // Source 2: no RSS feed, only HTML — should return NotFound
+ let server2 = MockServer::start().await;
+ let html = r#"No feed
+ Article 1
+ "#;
+
+ Mock::given(method("GET"))
+ .respond_with(ResponseTemplate::new(200).set_body_string(html))
+ .mount(&server2)
+ .await;
+
+ let result = detect_and_parse_feed(
+ &client,
+ &server2.uri(),
+ None,
+ None,
+ 10,
+ ).await;
+
+ // No feed found — pipeline would fall back to source_scraper
+ assert!(matches!(result, FeedResult::NotFound));
+ }
+```
+
+- [ ] **Step 2: Run all feed_parser tests**
+
+Run: `cd backend && cargo test --lib feed_parser -- --nocapture`
+Expected: all 16 tests pass
+
+- [ ] **Step 3: Run full unit test suite**
+
+Run: `cd backend && cargo test --lib`
+Expected: all tests pass
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add backend/src/services/feed_parser.rs
+git commit -m "test: add end-to-end RSS flow test for feed_parser"
+```