A Retrieval-Augmented Generation (RAG) system for querying data, powered by LangChain, Qdrant, and Anthropic.
-
Install Dependencies:
uv venv && .venv/bin/activate uv sync -
Setup Environment: Create a
.envfile and provide keys and values:cp .env.example .env ANTHROPIC_AUTH_TOKEN=your_anthropic_api_key
-
Run Qdrant: Start a Qdrant instance.
docker compose up -d
-
Pull Ollama Models:
ollama pull qllama/bge-small-en-v1.5 # Embedding model ollama pull bona/bge-reranker-v2-m3 # Reranking model # (Optional) for using open-source models for LLM inference (eg. llama3.2:3b, qwen3:1.7b, etc) ollama pull llama3.2:3b
-
Initialize & Chat:
python setup.py # to index the script data to Qdrant DB python main.py
- Knowledge Base: Context-aware retrieval from provided data.
- Semantic Search: High-performance vector search using Qdrant.
- Reranking: Docs retrieved from VectorDB are reranked based on the query.
- Streaming Responses: Real-time AI response streaming for an interactive CLI experience.
| Variable | Description | Default |
|---|---|---|
EMBED_MODEL |
Embedding model for vectorization | qllama/bge-small-en-v1.5 |
RERANKER_MODEL |
Reranking model for reranking retrieved docs | bona/bge-reranker-v2-m3 |
LLM_MODEL |
LLM for generating responses | claude-sonnet-4-6 |
COLLECTION_NAME |
Qdrant collection name | documents |
VECTOR_SIZE |
Vector dimensions | 384 |
MIT