HTMLCut extracts a specific value or fragment from an HTML file, a web page, or stdin. Use a CSS selector when the content is in the parsed document, or use literal and regex boundaries when you need to cut raw source text.
You can save an extraction definition as a request file and rerun it later without restating the selector, slice boundaries, or output settings.
- Extract text, links, attributes, HTML fragments, or structured match data
- Cut raw source text between literal strings or regex boundaries
- Preview a source or an extraction before committing to final output
- Save reusable request files and replay them unchanged
- Write outputs or forensic bundles to disk
htmlcut select ./page.html \
--css 'article a.more' \
--value attribute \
--attribute href \
--emit-request-file ./article-link.request.json \
--overwrite
htmlcut select --request-file ./article-link.request.jsonThe first command writes a reusable extraction definition. The second command reruns that saved definition, so you get the same selector and output settings without repeating the inline flags.
The complete index of Markdown documentation under docs/ lives in docs/README.md.
HTMLCut is released under the MIT License. See NOTICE and PATENTS for the remaining legal files.