Releases: QuartzUnit/markgrab
Releases · QuartzUnit/markgrab
v0.1.2
v0.1.1
Initial Public Release
Universal web content extraction — any URL to LLM-ready markdown.
Highlights
- HTML: BeautifulSoup + content density filtering (removes nav, sidebar, ads)
- YouTube: Transcript extraction with timestamps and multi-language support
- PDF: Text extraction with page structure (pdfplumber)
- DOCX: Paragraph and heading extraction (python-docx)
- Auto-fallback: httpx first, Playwright for JS-heavy pages
- Async-first: Built on httpx and Playwright async APIs
- CLI:
markgrab <url>with markdown/text/JSON output - Anti-bot stealth: Opt-in Playwright stealth scripts
- 114 unit tests, all passing
- MIT licensed
Install
pip install markgrabPython 3.11+ required. See README for details.