I needed a super simple tool to crawl a website (or the links in a llms.txt) into a formatted markdown file (without headers, navigation etc.) to add to Claude or ChatGPT project documents.
I haven't found an easy solution, there is some web based tool with a few free credits, but if you are already paying for some LLM with an api, why pay also someone else?
Thanks to uv, you can easily run it from anywhere without installing anything:
uvx url2llm \
--depth 1 \
--url "https://modelcontextprotocol.io/llms.txt" \
--instruction "I need documents related to developing MCP (model context protocol) servers" \
--provider "gemini/gemini-2.5-flash-preview-04-17" \
--api_key ${GEMINI_API_KEY}Then drag ./model-context-protocol-documentation.md into ChatGPT/Claude!
Tip
You can invoke it with url2llm as a properly installed cli tool after running uv tool install url2llm.
pip install url2llm
The script uses Crawl4AI:
- For each url in the crawling, the script produces a markdown
- Then it asks the LLM to extract from each page only the content relevant to the given instruction.
- Merge all pages into one and save the merged file.
- To use another LLM provider, just change
--providerto eg.openai/gpt-4o- always set
--api-key, it is not always inferred correctly fron env vars
- always set
- Provide a clear goal to
--instruction. This will guide the LLM to filter out irrelevant pages. - Recommended depth (default =
2):2or1for normal website1for llms.txt
- Provide
--output_dirto change where files are saved (default =.) - If you need the single pages, use
--keep_pages True(default =False) - You can specify the concurrency with
--concurrency(default =16) - The scripts deletes files shorter than
--min_chars(default =1000)
Caution
If you need to do more complex stuff use Crawl4AI directly and build it yourself: https://docs.crawl4ai.com/