Skip to content

djocz/rag-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– rag-agent

A pure LangGraph RAG (Retrieval Augmented Generation) agent that lets you ask questions about your own documents in natural language. Index PDFs, text files, markdown, Python files, and Wikipedia articles β€” then chat with them in a beautiful Streamlit interface or terminal.

Built from scratch with LangGraph, LangChain, Chroma, and HuggingFace embeddings. No hallucination β€” every answer is grounded in your documents with exact citations.


✨ Features

  • Multi-format indexing β€” PDF, TXT, MD, Python files, Wikipedia articles
  • Semantic search β€” finds content by meaning not just keywords
  • Grounded answers β€” Claude answers ONLY from your documents, never from training data
  • Exact citations β€” every answer cites source document and page number
  • Similarity scores β€” see how relevant each chunk is as a visual progress bar
  • Conversational memory β€” remembers previous questions for natural follow-ups
  • Question reformulation β€” automatically makes ambiguous follow-up questions self-contained
  • Quality gates β€” refuses to answer if no relevant content found (no hallucination)
  • Two interfaces β€” Streamlit web UI or terminal
  • Model agnostic β€” switch any LLM via .env, zero code changes
  • Free embeddings β€” HuggingFace runs locally, no API cost for indexing

πŸ—οΈ Architecture

Two independent pipelines

Indexing pipeline β€” runs once per document:

Document (PDF/TXT/MD/PY/Wiki)
  ↓
loader node        extracts raw text
  ↓
chunker node       splits into 400-word chunks with 50-word overlap
  ↓
store node         embeds with HuggingFace + saves to Chroma
  ↓
Chroma DB          persists to disk β€” survives restarts

Query pipeline β€” runs every question:

Your question + conversation history
  ↓
reformulator node  makes ambiguous questions self-contained
  ↓
retriever node     finds top 5 chunks by semantic similarity
  ↓
quality check      similarity score > 0.5?
  ↓                         ↓
answer node        no_answer node (honest fallback)
  ↓
display            answer + citations + similarity scores
  ↓
memory node        saves Q&A for next question's context

πŸ“ Project Structure

rag-agent/
β”‚
β”œβ”€β”€ .env                   # API keys + model config
β”œβ”€β”€ requirements.txt       # all dependencies
β”‚
β”œβ”€β”€ config.py              # model agnostic LLM + embedder getter
β”œβ”€β”€ state.py               # IndexState + QueryState TypedDicts
β”œβ”€β”€ nodes.py               # all node functions (pure LangChain)
β”œβ”€β”€ edges.py               # routing and conditional logic
β”œβ”€β”€ graph.py               # two StateGraphs assembled
β”‚
β”œβ”€β”€ streamlit_app.py       # Streamlit web interface
β”œβ”€β”€ run.py                 # terminal interface
β”‚
β”œβ”€β”€ documents/             # add your files here (gitignored)
└── chroma_db/             # auto-created, persists between runs

βš™οΈ Setup

Prerequisites

  • Python 3.11 or 3.12
  • pyenv recommended for version management

1. Clone the repository

git clone https://github.com/yourusername/rag-agent.git
cd rag-agent

2. Set up Python version

pyenv install 3.12
pyenv local 3.12

3. Create virtual environment

python3 -m venv venv
source venv/bin/activate  # Mac/Linux
venv\Scripts\activate     # Windows

4. Install dependencies

pip install --upgrade pip
pip install -r requirements.txt

5. Configure environment variables

Create a .env file in the project root:

# API Keys
ANTHROPIC_API_KEY=your_anthropic_key_here
GROQ_API_KEY=your_groq_key_here

# Models β€” change freely, zero code changes needed
LLM_REFORMULATOR=groq/llama-3.3-70b-versatile
LLM_ANSWER=anthropic/claude-sonnet-4-6

# Embedding model (free, runs locally)
EMBEDDING_MODEL=all-MiniLM-L6-v2

# Chroma settings
CHROMA_PATH=./chroma_db
CHROMA_COLLECTION=rag_agent

# Retrieval settings
RETRIEVAL_TOP_K=5
QUALITY_THRESHOLD=0.5

# Chunk settings
CHUNK_SIZE=400
CHUNK_OVERLAP=50

6. Create documents folder

mkdir documents

Add any PDF, TXT, MD, or Python files you want to query.


πŸš€ Running

Option 1 β€” Streamlit web interface (recommended)

streamlit run streamlit_app.py

Opens automatically at http://localhost:8501

Option 2 β€” Terminal interface

python3 run.py

Available commands:

index <path>          Index a file
                      e.g. index documents/report.pdf

index wiki:<topic>    Index a Wikipedia article
                      e.g. index wiki:Machine Learning

ask <question>        Ask a question
                      e.g. ask What is backpropagation?

list                  Show all indexed documents
clear                 Clear conversation history
help                  Show all commands
exit                  Quit

πŸ–₯️ Streamlit Interface

The web interface has two panels:

Sidebar:

  • Drag and drop file upload OR type file path/wiki topic
  • Document library showing all indexed files with chunk counts
  • Delete individual documents
  • Session stats β€” total tokens used and estimated cost

Main area:

  • ChatGPT-style conversation interface
  • Streaming responses
  • Expandable metadata below each answer:
    • Model used
    • Token count and cost estimate
    • Relevance scores as visual progress bars
    • Source document and page number
    • Relevant chunk text previews

πŸ” Supported Document Types

Type Extension How indexed
PDF .pdf pdfplumber β€” preserves page numbers
Text .txt direct read
Markdown .md direct read
Python .py direct read β€” query your codebase
Wikipedia wiki:Topic live API fetch

🧠 How It Works

Why RAG instead of fine-tuning?

Fine-tuning:  expensive, slow, fixed knowledge
RAG:          free to index, instant updates, always current

Why chunking with overlap?

Documents are split into ~400 word chunks with 50 word overlap. Overlap ensures important context at chunk boundaries is never lost.

Why question reformulation?

Follow-up questions like "Can you give an example?" are ambiguous without context. The reformulator uses conversation history to rewrite them as self-contained questions before searching Chroma β€” dramatically improving retrieval accuracy.

Why quality gates?

If no relevant chunks are found (similarity < 0.5), the agent refuses to answer rather than hallucinating from Claude's training data. This is the key safety mechanism in production RAG systems.

Why HuggingFace for embeddings?

HuggingFace all-MiniLM-L6-v2 runs locally β€” no API calls, no cost, no rate limits. The same model must be used for both indexing and querying, otherwise the vector spaces don't align.


πŸ’° Cost Per Question

Component Model Cost
Reformulation Groq Llama 3.3 $0.000 (free)
Retrieval Chroma (local math) $0.000
Answer Claude Sonnet ~$0.013
Total ~$0.013

Indexing cost: $0.000 β€” HuggingFace embeddings run locally.

Switch LLM_ANSWER to anthropic/claude-haiku-4-5-20251001 for ~$0.001 per question during development.


πŸ”‘ Getting API Keys

Service URL Free Tier
Anthropic (Claude) console.anthropic.com Pay as you go
Groq console.groq.com 30,000 tokens/min free

βš™οΈ Configuration Reference

All settings live in .env β€” no code changes needed:

# Switch answer model
LLM_ANSWER=anthropic/claude-haiku-4-5-20251001   # cheaper
LLM_ANSWER=anthropic/claude-sonnet-4-6            # smarter

# Tune retrieval
RETRIEVAL_TOP_K=3       # faster, cheaper
RETRIEVAL_TOP_K=5       # default
RETRIEVAL_TOP_K=8       # more context

# Tune quality threshold
QUALITY_THRESHOLD=0.3   # more permissive
QUALITY_THRESHOLD=0.5   # default
QUALITY_THRESHOLD=0.7   # stricter

# Tune chunk size
CHUNK_SIZE=200           # smaller, more precise
CHUNK_SIZE=400           # default
CHUNK_SIZE=600           # larger, more context

πŸ› Common Issues

Unable to infer model provider

Pass model and provider separately in config.py:

return init_chat_model(model=model, model_provider=provider)

Similarity scores always low

Check your SERPER_API_KEY if using web search, or try lowering QUALITY_THRESHOLD to 0.3.

PDF text extraction empty

Some PDFs are scanned images β€” pdfplumber can't extract text from images. Try OCR tools first to convert to text.

Chroma collection errors

Delete the chroma_db/ folder and re-index:

rm -rf chroma_db/

Python version errors

Use Python 3.11 or 3.12 β€” not 3.14:

pyenv install 3.12
pyenv local 3.12

πŸ—ΊοΈ Roadmap

  • OCR support for scanned PDFs
  • DOCX and CSV support
  • Re-index on document change detection
  • Conversation export to markdown
  • Multi-collection support (separate knowledge bases)
  • Deploy to cloud (Railway / Render)

πŸ§ͺ What I Learned Building This

  • LangGraph StateGraph with two independent pipelines
  • Vector databases and semantic search with Chroma
  • How embeddings capture meaning as numbers
  • Why chunking strategy matters for retrieval quality
  • Question reformulation for conversational RAG
  • Grounding LLM answers in documents via prompt design
  • Quality gates to prevent hallucination
  • Token management and cost optimisation in RAG
  • Building production-grade Streamlit interfaces

πŸ”— Related Projects


πŸ“„ License

MIT License β€” free to use, modify and distribute.


πŸ™ Acknowledgements

About

A pure LangGraph RAG (Retrieval Augmented Generation) agent that lets you ask questions about your own documents in natural language

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages