HashIndex ⚡️

Ultra-fast, LLM-optimized document indexing in Python.

Built by the team at Pardus AI – The fastest AI Data Analysis Platform.

HashIndex is the core indexing engine we use at Pardus AI to process 50MB+ CSVs and PDFs in seconds. We are open-sourcing our Python implementation so you can build better RAG pipelines without the bloat of LangChain.

Want to analyze documents without coding? Try our no-code platform: Pardus AI Dashboard (Free for huge files).

Installation

# Clone the repository
git clone https://github.com/JasonHonKL/HashIndex.git
cd HashIndex

# Install with uv (recommended - faster and more reliable)
uv venv                    # Create virtual environment
uv sync                    # Install dependencies and package in editable mode
source .venv/bin/activate  # Activate the virtual environment (Linux/Mac)
# or
.venv\Scripts\activate     # Activate the virtual environment (Windows)

# Alternatively, install with pip
pip install -e .

Usage

As a Python Library

from hashindex import index_pdf, query_index, HashIndex

# Index a PDF document
index = index_pdf("document.pdf")

# Save the index
index.save("document.index.json")

# Load an existing index
index = HashIndex.load("document.index.json")

# Query the index
answer = query_index(index, "What is the main conclusion?")
print(answer)

Using the CLI

cp .env.example .env

then just modify the config we support almost all api !

Advanced Usage

from hashindex import HashIndex, Model, ListKeys, GetSummary, GetContent

# Create a custom model
model = Model(model="anthic/claude-3.5-sonnet")

# Work with index objects directly
index = HashIndex()
# ... customize indexing logic ...

# Use verbose=False for silent operation
from hashindex import index_pdf, query_index
index = index_pdf("document.pdf", verbose=False)
answer = query_index(index, "Your question", verbose=False)

# Access pages directly
for key, obj in index.PageTable.items():
    print(f"{key}: {obj.summary}")

Comparative Analysis

HashIndex outperforms standard paradigms in specific Long-Context Narrative tasks where causality matters more than keyword matching.

Method	Topology	Context Management	Robustness (Unstructured Data)	Latency
Vector RAG	Disconnected Chunks	Additive (FIFO overflow)	High	Low (O(1))
PageIndex	Hierarchical Tree	Path-Dependent	Low (Requires Clean Headers)	High (O(log n))
RAPTOR	Recursive Tree	Cluster-Based	Medium	Medium
HashIndex (Ours)	Hash Table	Dynamic Pruning (Agent-led)	High (Mechanical Split)	Medium-Low

By treating document chunks as Hash Table entries rather than Vector Embeddings, HashIndex avoids the 'Lost in the Middle' phenomenon common in vector search.

Citation

If you use HashIndex in your research or project, please cite it as follows:

@software{HashIndex2026,
  author = {Hon, Jason and Pardus AI Team},
  title = {HashIndex: LLM-optimized Document Indexing without vector search},
  year = {2026},
  publisher = {Pardus AI}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
examples		examples
src/hashindex		src/hashindex
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HashIndex ⚡️

Ultra-fast, LLM-optimized document indexing in Python.

Installation

Usage

As a Python Library

Using the CLI

Advanced Usage

Comparative Analysis

Citation

About

Uh oh!

Releases

Packages

Languages

License

JasonHonKL/HashIndex

Folders and files

Latest commit

History

Repository files navigation

HashIndex ⚡️

Ultra-fast, LLM-optimized document indexing in Python.

Installation

Usage

As a Python Library

Using the CLI

Advanced Usage

Comparative Analysis

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages