HTMLCut — repeatable HTML extraction from files, URLs, and stdin

HTMLCut extracts a specific value or fragment from an HTML file, a web page, or stdin. Use a CSS selector when the content is in the parsed document, or use literal and regex boundaries when you need to cut raw source text.

You can save an extraction definition as a request file and rerun it later without restating the selector, slice boundaries, or output settings.

Extract text, links, attributes, HTML fragments, or structured match data
Cut raw source text between literal strings or regex boundaries
Preview a source or an extraction before committing to final output
Save reusable request files and replay them unchanged
Write outputs or forensic bundles to disk

Save and Reuse an Extraction

htmlcut select ./page.html \
  --css 'article a.more' \
  --value attribute \
  --attribute href \
  --emit-request-file ./article-link.request.json \
  --overwrite

htmlcut select --request-file ./article-link.request.json

The first command writes a reusable extraction definition. The second command reruns that saved definition, so you get the same selector and output settings without repeating the inline flags.

Documentation Index

The complete index of Markdown documentation under docs/ lives in docs/README.md.

Legal

HTMLCut is released under the MIT License. See NOTICE and PATENTS for the remaining legal files.

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
.cargo		.cargo
.codex		.codex
.devcontainer		.devcontainer
.github		.github
crates		crates
docs		docs
fuzz		fuzz
patches		patches
scripts		scripts
semver-baseline/htmlcut-core		semver-baseline/htmlcut-core
xtask		xtask
.gitattributes		.gitattributes
.gitignore		.gitignore
.ignore		.ignore
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
NOTICE		NOTICE
PATENTS.md		PATENTS.md
README.md		README.md
changelog.md		changelog.md
check.sh		check.sh
deny.toml		deny.toml
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HTMLCut — repeatable HTML extraction from files, URLs, and stdin

Save and Reuse an Extraction

Documentation Index

Legal

About

Uh oh!

Releases 20

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HTMLCut — repeatable HTML extraction from files, URLs, and stdin

Save and Reuse an Extraction

Documentation Index

Legal

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 20

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages