Skip to content

Releases: QuartzUnit/markgrab

v0.1.2

17 Mar 01:48

Choose a tag to compare

Changes

  • Add MCP server module (markgrab-mcp) with 2 tools: extract_url, extract_multiple
  • Add [mcp] optional dependency (pip install "markgrab[mcp]")
  • Register on MCP official registry (io.github.ArkNill/markgrab)
  • 114 tests, all passing

v0.1.1

16 Mar 23:57

Choose a tag to compare

Initial Public Release

Universal web content extraction — any URL to LLM-ready markdown.

Highlights

  • HTML: BeautifulSoup + content density filtering (removes nav, sidebar, ads)
  • YouTube: Transcript extraction with timestamps and multi-language support
  • PDF: Text extraction with page structure (pdfplumber)
  • DOCX: Paragraph and heading extraction (python-docx)
  • Auto-fallback: httpx first, Playwright for JS-heavy pages
  • Async-first: Built on httpx and Playwright async APIs
  • CLI: markgrab <url> with markdown/text/JSON output
  • Anti-bot stealth: Opt-in Playwright stealth scripts
  • 114 unit tests, all passing
  • MIT licensed

Install

pip install markgrab

Python 3.11+ required. See README for details.