Skip to content

System task: Broken external link detection #498

@chubes4

Description

@chubes4

Context

wp datamachine links broken currently checks internal links via HTTP HEAD requests. But posts also link out to external URLs — blog posts, resources, documentation, tools — and those go stale over time. 404'd outbound links hurt SEO (Google sees them as a quality signal) and hurt user experience.

Proposed

Extend the links system to detect broken external outbound links.

wp datamachine links broken --external

Or potentially a separate subcommand like wp datamachine links broken-external — depends on how the existing links broken is structured.

Should:

  • Scan post content for all outbound <a> tags pointing to external domains
  • HTTP HEAD each unique URL (with configurable timeout, default 5s)
  • Flag 404, 410, 5xx, timeouts, and connection refused
  • Report: post ID, post title, broken URL, HTTP status, anchor text
  • Respect rate limiting per domain (don't hammer a single host)
  • Cache results (24hr TTL like the internal link graph) so repeated runs are fast
  • --post_id, --category, --limit filters like other link commands
  • Table/JSON/CSV output

Considerations

  • External link checking is slow and expensive — hundreds of unique URLs across a content site. Should use the batch system for progress tracking.
  • Some sites block HEAD requests — fall back to GET with a range header.
  • Rate limiting per domain is critical to avoid getting the site's IP blocked.
  • Could reuse the existing link graph cache if it already stores external URLs, or extend it.

Why

Every content site accumulates dead outbound links over time. This is one of the most common SEO audit findings and there's no good WordPress-native solution — people use external tools like Screaming Frog or Ahrefs. DM can do it from inside WordPress with zero external dependencies.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions