url2llm

I needed a super simple tool to crawl a website (or the links in a llms.txt) into a formatted markdown file (without headers, navigation etc.) to add to Claude or ChatGPT project documents.

I haven't found an easy solution, there is some web based tool with a few free credits, but if you are already paying for some LLM with an api, why pay also someone else?

Quickstart

With uv (recommended):

Thanks to uv, you can easily run it from anywhere without installing anything:

uvx url2llm \
   --depth 1 \
   --url "https://modelcontextprotocol.io/llms.txt" \
   --instruction "I need documents related to developing MCP (model context protocol) servers" \
   --provider "gemini/gemini-2.5-flash-preview-04-17" \
   --api_key ${GEMINI_API_KEY}

Then drag ./model-context-protocol-documentation.md into ChatGPT/Claude!

Tip

You can invoke it with url2llm as a properly installed cli tool after running uv tool install url2llm.

With pip (alternative):

pip install url2llm

What it does

The script uses Crawl4AI:

For each url in the crawling, the script produces a markdown
Then it asks the LLM to extract from each page only the content relevant to the given instruction.
Merge all pages into one and save the merged file.

Command args and hints

To use another LLM provider, just change --provider to eg. openai/gpt-4o
- always set --api-key, it is not always inferred correctly fron env vars
Provide a clear goal to --instruction. This will guide the LLM to filter out irrelevant pages.
Recommended depth (default = 2):
- 2 or 1 for normal website
- 1 for llms.txt
Provide --output_dir to change where files are saved (default = .)
If you need the single pages, use --keep_pages True (default = False)
You can specify the concurrency with --concurrency (default = 16)
The scripts deletes files shorter than --min_chars (default = 1000)

Caution

If you need to do more complex stuff use Crawl4AI directly and build it yourself: https://docs.crawl4ai.com/

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
src/url2llm		src/url2llm
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
run_example.sh		run_example.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

url2llm

Quickstart

With uv (recommended):

With pip (alternative):

What it does

Command args and hints

About

Uh oh!

Uh oh!

Languages

License

diegobit/url2llm

Folders and files

Latest commit

History

Repository files navigation

url2llm

Quickstart

With uv (recommended):

With pip (alternative):

What it does

Command args and hints

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages