You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
oabrivard 3d790e7ce7 feat: extract article URLs from JSON-LD structured data in source pages
Many modern sites (Hugo, WordPress, Next.js) load articles via JavaScript
but include full article URLs in JSON-LD schema.org markup in the <head>.
The scraper now extracts these first (highest quality), then falls back
to <a href> heuristic extraction. Supports ItemList, BlogPosting,
NewsArticle, @graph arrays, and mainEntity wrappers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 months ago
..
migrations feat: add theme schedules — model, DB, CRUD handler, routes 3 months ago
src feat: extract article URLs from JSON-LD structured data in source pages 2 months ago
tests fix: return 204 No Content from preferred sources endpoint 2 months ago
Cargo.lock feat: add pipeline integration tests with MockLlmProvider and wiremock 3 months ago
Cargo.toml feat: add pipeline integration tests with MockLlmProvider and wiremock 3 months ago
Dockerfile