Skip to content

resoltico/HTMLCut

Repository files navigation

HTMLCut — repeatable HTML extraction from files, URLs, and stdin

HTMLCut extracts a specific value or fragment from an HTML file, a web page, or stdin. Use a CSS selector when the content is in the parsed document, or use literal and regex boundaries when you need to cut raw source text.

You can save an extraction definition as a request file and rerun it later without restating the selector, slice boundaries, or output settings.

  • Extract text, links, attributes, HTML fragments, or structured match data
  • Cut raw source text between literal strings or regex boundaries
  • Preview a source or an extraction before committing to final output
  • Save reusable request files and replay them unchanged
  • Write outputs or forensic bundles to disk

Save and Reuse an Extraction

htmlcut select ./page.html \
  --css 'article a.more' \
  --value attribute \
  --attribute href \
  --emit-request-file ./article-link.request.json \
  --overwrite

htmlcut select --request-file ./article-link.request.json

The first command writes a reusable extraction definition. The second command reruns that saved definition, so you get the same selector and output settings without repeating the inline flags.

Documentation Index

The complete index of Markdown documentation under docs/ lives in docs/README.md.

Legal

HTMLCut is released under the MIT License. See NOTICE and PATENTS for the remaining legal files.

About

Extract and inspect HTML from files, HTTP(S) URLs, or stdin with CSS selectors, literal or regex slicing, reusable request files, and JSON reports.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages