-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Context
wp datamachine links broken currently checks internal links via HTTP HEAD requests. But posts also link out to external URLs — blog posts, resources, documentation, tools — and those go stale over time. 404'd outbound links hurt SEO (Google sees them as a quality signal) and hurt user experience.
Proposed
Extend the links system to detect broken external outbound links.
wp datamachine links broken --external
Or potentially a separate subcommand like wp datamachine links broken-external — depends on how the existing links broken is structured.
Should:
- Scan post content for all outbound
<a>tags pointing to external domains - HTTP HEAD each unique URL (with configurable timeout, default 5s)
- Flag 404, 410, 5xx, timeouts, and connection refused
- Report: post ID, post title, broken URL, HTTP status, anchor text
- Respect rate limiting per domain (don't hammer a single host)
- Cache results (24hr TTL like the internal link graph) so repeated runs are fast
--post_id,--category,--limitfilters like other link commands- Table/JSON/CSV output
Considerations
- External link checking is slow and expensive — hundreds of unique URLs across a content site. Should use the batch system for progress tracking.
- Some sites block HEAD requests — fall back to GET with a range header.
- Rate limiting per domain is critical to avoid getting the site's IP blocked.
- Could reuse the existing link graph cache if it already stores external URLs, or extend it.
Why
Every content site accumulates dead outbound links over time. This is one of the most common SEO audit findings and there's no good WordPress-native solution — people use external tools like Screaming Frog or Ahrefs. DM can do it from inside WordPress with zero external dependencies.