Skip to content

hoijui/mle

Repository files navigation

Markup Link Extractor

License: AGPL-3.0-or-later REUSE status Repo Package Releases Documentation Releases downloads Dependency Status Build Status

In cooperation with FabCity Hamburg In cooperation with Open Source Ecology Germany

Extracts links and_or anchors from markup files. Currently, markdown/md and html files are supported. The main intended purpose of the Markup Link Extractor, is to extract links from a set of files, and then check them for validity using a separate tool, e.g. the Markdown Link Checker. Together, two such tools could be integrated in your CI pipeline to warn about broken links in your markup docs.

Features

  • Extracts links from markdown/md and html files
  • Extracts anchors from markdown/md and html files.
    Anchors are parts of a file that can be linked to, by appending the parts identifier/name to the file path/URL after a # (hash);
    e.g. https://www.example.com/some-dir/some-file.html#sub-section
  • Support HTML links and plain URLs in markdown files
  • Command line interface according to the UNIX philosophy, first item: of "Make each program do one thing well".
    -> Therefore, this tool does not scan for markup files, nor does it check the links itself.
  • Easy CI pipeline integration
  • Very fast execution using async
  • Operates offline, accessing only files on the local file-system

Install Locally

There are different ways to install and use mle.

Cargo

Use rust's package manager cargo to install mle from crates.io:

cargo install mle

Download Binaries

To download a compiled binary version of mle go to github releases and download the binaries compiled for x86_64-unknown-linux-gnu or x86_64-apple-darwin.

CI Pipeline Integration

GitHub Actions

Use mle in GitHub using the GitHub-Action from the Marketplace.

- name: Markup Link Extractor (mle)
  uses: hoijui/mle@v0.14.3

Use mle command line arguments using the with argument:

- name: Markup Link Extractor (mle)
  uses: hoijui/mle@v0.14.3
  with:
    args: ./README.md

Binary

To integrate mle in your CI pipeline running in a linux x86_64 environment, you can add the following commands to download the tool:

curl -L https://github.com/hoijui/mle/releases/download/v0.14.3/mle -o mle
chmod +x mle

For example take a look at the ntest repo which uses mle in the CI pipeline.

Docker

Use the mle docker image from the docker hub, which includes mle.

Usage

Once you have mle installed, it can be called from the command line. The following call will extract all links in markup files found under the current folder (including sub-directories):

mle ./**.{html,md}

This extracts links from all git-tracked Markdown files, except those matching README or LICENSE, and write the result to stdout in CSV format.

# explicit version
g ls-files **.{html,md} -z \
    | grep --null-data --invert-match --ignore-case --regexp README --regexp LICENSE \
    | xargs -0 mle --result-format csv
# same in short form
g ls-files **.{html,md} -z | grep -z -v -i -e README -e LICENSE | xargs -0 mle --result-format csv

Here we write the list of files to a file first, and then pass that to mle. This is useful for when the list of files is used multiple times, or if it is very large, potentially exceeding the shells limit for arguments.

g ls-files **.{html,md} -z | tr '\0' '\n' > /tmp/link-check_files.csv
mle --markup-files-list /tmp/link-check_files.csv

Call mle with the --help flag to display all available cli arguments:

mle --help

Funding

This project was funded by the European Regional Development Fund (ERDF) in the context of the INTERFACER Project, from July 2022 (fork from mlc/project start) until March 2023.

Logo of the EU ERDF program

About

Extracts links from markup files. Currently `html` and `markdown` files are supported.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages