From 464d319ad3b02cf7f8da16e7e506c9b98df25482 Mon Sep 17 00:00:00 2001 From: Abdul Basit Tonmoy Date: Fri, 3 Jul 2026 20:59:22 -0700 Subject: [PATCH 1/2] ci: add a markdown link + badge checker for README and docs/ The README and docs/ carry many links, shields.io badges, and raw-GitHub asset URLs. Dead links and dead badges are a common rot in public repos and there was no guard, so a moved file or renamed asset would silently 404 for visitors. Add a links workflow that runs lychee over README.md and docs/**/*.md: - Triggers on PRs that touch a *.md file (plus the checker's own config) and on a weekly cron, with workflow_dispatch for manual runs. - Hard 404s fail the job. Flaky or non-fetchable hosts are tolerated via .lycheeignore: localhost examples, shields.io / hub.docker.com / star-history badge hosts, and the interactive "ask an assistant" deep links (chatgpt.com / claude.ai / gemini.google.com) that block bots. Real content links (github.com, raw.githubusercontent.com, npm, pypi, the detector sites) stay checked. - Uses GITHUB_TOKEN so the many github.com links don't hit the anonymous API rate limit. Document the workflow and the .lycheeignore escape hatch in CONTRIBUTING.md. Closes #13 --- .github/workflows/links.yml | 37 +++++++++++++++++++++++++++++++++++++ .lycheeignore | 18 ++++++++++++++++++ CONTRIBUTING.md | 5 +++++ 3 files changed, 60 insertions(+) create mode 100644 .github/workflows/links.yml create mode 100644 .lycheeignore diff --git a/.github/workflows/links.yml b/.github/workflows/links.yml new file mode 100644 index 0000000..d1aa069 --- /dev/null +++ b/.github/workflows/links.yml @@ -0,0 +1,37 @@ +name: links + +# Dead links and dead badges are a common rot in public repos. This verifies every +# markdown link + image/badge URL in README.md and docs/ still resolves, so a moved +# file or renamed release asset can't silently 404 for visitors. +on: + pull_request: + paths: + - "**/*.md" + - ".lycheeignore" + - ".github/workflows/links.yml" + schedule: + - cron: "0 6 * * 1" # Mondays 06:00 UTC — catches external rot between PRs + workflow_dispatch: + +permissions: + contents: read + +concurrency: + group: links-${{ github.ref }} + cancel-in-progress: true + +jobs: + linkcheck: + name: markdown links + badges + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Check links + uses: lycheeverse/lychee-action@v2 + with: + # Hard 404s fail the job; flaky/rate-limited hosts are tolerated via .lycheeignore. + args: --no-progress --max-retries 3 README.md "docs/**/*.md" + fail: true + env: + # Authenticated github.com requests avoid the low anonymous API rate limit. + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} diff --git a/.lycheeignore b/.lycheeignore new file mode 100644 index 0000000..dfafdc8 --- /dev/null +++ b/.lycheeignore @@ -0,0 +1,18 @@ +# Hosts lychee should not check. One regex per line, matched against the whole URL. +# Keep this list to genuinely-flaky or non-fetchable endpoints — real content links +# (github.com, raw.githubusercontent.com, npm, pypi, detector sites) stay checked so +# a moved file or renamed asset is caught. + +# Local dev endpoints that appear in code examples, not real links. +^https?://localhost +^https?://127\.0\.0\.1 + +# Badge / image hosts that rate-limit CI crawlers; a stale badge is cosmetic, not a 404. +img\.shields\.io +hub\.docker\.com +star-history\.com + +# "Ask an assistant" deep links — interactive endpoints that block bots or require login. +^https://chatgpt\.com +^https://claude\.ai +^https://gemini\.google\.com diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 660e87a..5146a3e 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -59,6 +59,11 @@ Optionally install the git hooks so they run automatically: pip install pre-commit && pre-commit install ``` +Markdown links and badges in `README.md` and `docs/` are verified by the **links** workflow +([lychee](https://github.com/lycheeverse/lychee-action)) — on PRs that touch a `*.md` file and on +a weekly schedule. A hard 404 fails the job. If an external host merely rate-limits the CI crawler, +add it to [`.lycheeignore`](.lycheeignore) rather than leaving the check red. + ## Submitting a change 1. **Open an issue first** for anything beyond a typo, so we can agree on the surface and approach. From 04131685941ab0f26ef5336150fafc8dd1f41b6b Mon Sep 17 00:00:00 2001 From: Abdul Basit Tonmoy Date: Fri, 3 Jul 2026 21:01:36 -0700 Subject: [PATCH 2/2] ci: ignore npmjs.com in the link checker (403s non-browser crawlers) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The first run failed only on https://www.npmjs.com/package/tilion-fortress, which npm's registry returns 403 Forbidden to for non-browser clients as bot protection — the package page is not fetchable from CI. Add npmjs.com to .lycheeignore, the same tolerance already applied to the badge hosts. PyPI serves crawlers fine and stays checked. --- .lycheeignore | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/.lycheeignore b/.lycheeignore index dfafdc8..3b964ed 100644 --- a/.lycheeignore +++ b/.lycheeignore @@ -12,6 +12,10 @@ img\.shields\.io hub\.docker\.com star-history\.com +# npm's registry returns 403 Forbidden to non-browser crawlers (bot protection), so the +# package page is not fetchable from CI. PyPI, by contrast, is checked normally. +npmjs\.com + # "Ask an assistant" deep links — interactive endpoints that block bots or require login. ^https://chatgpt\.com ^https://claude\.ai