Skip to content

ishu-codes/enhanced-rag

Repository files navigation

Enhanced RAG System

A Retrieval-Augmented Generation (RAG) system for querying data, powered by LangChain, Qdrant, and Anthropic.

Quick Start

  1. Install Dependencies:

    uv venv && .venv/bin/activate
    uv sync
  2. Setup Environment: Create a .env file and provide keys and values:

    cp .env.example .env
    
    ANTHROPIC_AUTH_TOKEN=your_anthropic_api_key
  3. Run Qdrant: Start a Qdrant instance.

    docker compose up -d
  4. Pull Ollama Models:

    ollama pull qllama/bge-small-en-v1.5  # Embedding model
    ollama pull bona/bge-reranker-v2-m3   # Reranking model
    
    # (Optional) for using open-source models for LLM inference (eg. llama3.2:3b, qwen3:1.7b, etc)
    ollama pull llama3.2:3b
  5. Initialize & Chat:

    python setup.py  # to index the script data to Qdrant DB
    python main.py

Features

  • Knowledge Base: Context-aware retrieval from provided data.
  • Semantic Search: High-performance vector search using Qdrant.
  • Reranking: Docs retrieved from VectorDB are reranked based on the query.
  • Streaming Responses: Real-time AI response streaming for an interactive CLI experience.

Configuration

Variable Description Default
EMBED_MODEL Embedding model for vectorization qllama/bge-small-en-v1.5
RERANKER_MODEL Reranking model for reranking retrieved docs bona/bge-reranker-v2-m3
LLM_MODEL LLM for generating responses claude-sonnet-4-6
COLLECTION_NAME Qdrant collection name documents
VECTOR_SIZE Vector dimensions 384

License

MIT

About

Enhanced RAG system for querying custom knowledge bases using semantic search, reranking, and LLMs, with a modular pipeline built on LangChain, Qdrant, and Anthropic, supporting streaming responses and flexible model integration

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages