-
Notifications
You must be signed in to change notification settings - Fork 0
html2text
Matthew Harris edited this page Jan 21, 2016
·
2 revisions
html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).
> python html2text.py -h
Usage: html2text.py [(filename|url) [encoding]]
Options:
--version show program's version number and exit
-h, --help show this help message and exit
--ignore-emphasis don't include any formatting for emphasis
--ignore-links don't include any formatting for links
--ignore-images don't include any formatting for images
-g, --google-doc convert an html-exported Google Document
-d, --dash-unordered-list
use a dash rather than a star for unordered list items
-e, --asterisk-emphasis
use an asterisk rather than an underscore for
emphasized text
-b BODY_WIDTH, --body-width=BODY_WIDTH
number of characters per output line, 0 for no wrap
-i LIST_INDENT, --google-list-indent=LIST_INDENT
number of pixels Google indents nested lists
-s, --hide-strikethrough
hide strike-through text. only relevant when -g is
specified as well
--escape-all Escape all special characters. Output is less
readable, but avoids corner case formatting issues.