Skip to content

AI powered link checker action #195

@mmcky

Description

@mmcky

I recently used copilot to do some link checking and it did a great job of not only testing them but updating links that were out of date to new locations (such as in old links to referenced documentation).

Can we build a link checker GitHub action that makes use of copilot or another AI agent to:

  • check all web links on a project (as a whole) that is based on a weekly schedule
  • just check all web links on specific documents that have changed as part of a pull request

The link-checker could be saved as an action in .github/actions/link-checker

What to parse:

Our project sources files are saved as MyST Markdown files which includes links to websites (html) but also includes links to other documents in a project, in addition to documents located in other Jupyter Book QuantEcon projects.

Here is an example project: https://github.com/QuantEcon/lecture-python-programming.myst

The source files for the published lectures are in the lectures folder.

It might be easier to parse the HTML output to check the links as MyST markdown links need to be parsed for context. The HTML output in our GitHub actions workflows is saved in _build/html/

HTML response codes when testing website links:

It would be fine to silently report the following status codes (without failing the action workflow)

  • 403
  • 503

and it would be nice if these codes could be configurable for future flexibility.

Current workflow

Currently we use a program called lychee to parse links, however this only checks the status of links and does not update some of the old redirects to newer, more relevant links. It would be nice if a link is redirected by an external server, that we update the link to the new location. The copilot suggestions made in this example were great, so an AI enabled workflow would be preferable.

An example of a current link checker workflow is https://github.com/QuantEcon/lecture-python-programming.myst/blob/main/.github/workflows/linkcheck.yml

but removing the dependancy onlychee and peter-evans/create-issue-from-file would be preferable.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions