A pure LangGraph RAG (Retrieval Augmented Generation) agent that lets you ask questions about your own documents in natural language. Index PDFs, text files, markdown, Python files, and Wikipedia articles β then chat with them in a beautiful Streamlit interface or terminal.
Built from scratch with LangGraph, LangChain, Chroma, and HuggingFace embeddings. No hallucination β every answer is grounded in your documents with exact citations.
- Multi-format indexing β PDF, TXT, MD, Python files, Wikipedia articles
- Semantic search β finds content by meaning not just keywords
- Grounded answers β Claude answers ONLY from your documents, never from training data
- Exact citations β every answer cites source document and page number
- Similarity scores β see how relevant each chunk is as a visual progress bar
- Conversational memory β remembers previous questions for natural follow-ups
- Question reformulation β automatically makes ambiguous follow-up questions self-contained
- Quality gates β refuses to answer if no relevant content found (no hallucination)
- Two interfaces β Streamlit web UI or terminal
- Model agnostic β switch any LLM via
.env, zero code changes - Free embeddings β HuggingFace runs locally, no API cost for indexing
Indexing pipeline β runs once per document:
Document (PDF/TXT/MD/PY/Wiki)
β
loader node extracts raw text
β
chunker node splits into 400-word chunks with 50-word overlap
β
store node embeds with HuggingFace + saves to Chroma
β
Chroma DB persists to disk β survives restarts
Query pipeline β runs every question:
Your question + conversation history
β
reformulator node makes ambiguous questions self-contained
β
retriever node finds top 5 chunks by semantic similarity
β
quality check similarity score > 0.5?
β β
answer node no_answer node (honest fallback)
β
display answer + citations + similarity scores
β
memory node saves Q&A for next question's context
rag-agent/
β
βββ .env # API keys + model config
βββ requirements.txt # all dependencies
β
βββ config.py # model agnostic LLM + embedder getter
βββ state.py # IndexState + QueryState TypedDicts
βββ nodes.py # all node functions (pure LangChain)
βββ edges.py # routing and conditional logic
βββ graph.py # two StateGraphs assembled
β
βββ streamlit_app.py # Streamlit web interface
βββ run.py # terminal interface
β
βββ documents/ # add your files here (gitignored)
βββ chroma_db/ # auto-created, persists between runs
- Python 3.11 or 3.12
- pyenv recommended for version management
git clone https://github.com/yourusername/rag-agent.git
cd rag-agentpyenv install 3.12
pyenv local 3.12python3 -m venv venv
source venv/bin/activate # Mac/Linux
venv\Scripts\activate # Windowspip install --upgrade pip
pip install -r requirements.txtCreate a .env file in the project root:
# API Keys
ANTHROPIC_API_KEY=your_anthropic_key_here
GROQ_API_KEY=your_groq_key_here
# Models β change freely, zero code changes needed
LLM_REFORMULATOR=groq/llama-3.3-70b-versatile
LLM_ANSWER=anthropic/claude-sonnet-4-6
# Embedding model (free, runs locally)
EMBEDDING_MODEL=all-MiniLM-L6-v2
# Chroma settings
CHROMA_PATH=./chroma_db
CHROMA_COLLECTION=rag_agent
# Retrieval settings
RETRIEVAL_TOP_K=5
QUALITY_THRESHOLD=0.5
# Chunk settings
CHUNK_SIZE=400
CHUNK_OVERLAP=50mkdir documentsAdd any PDF, TXT, MD, or Python files you want to query.
streamlit run streamlit_app.pyOpens automatically at http://localhost:8501
python3 run.pyAvailable commands:
index <path> Index a file
e.g. index documents/report.pdf
index wiki:<topic> Index a Wikipedia article
e.g. index wiki:Machine Learning
ask <question> Ask a question
e.g. ask What is backpropagation?
list Show all indexed documents
clear Clear conversation history
help Show all commands
exit Quit
The web interface has two panels:
Sidebar:
- Drag and drop file upload OR type file path/wiki topic
- Document library showing all indexed files with chunk counts
- Delete individual documents
- Session stats β total tokens used and estimated cost
Main area:
- ChatGPT-style conversation interface
- Streaming responses
- Expandable metadata below each answer:
- Model used
- Token count and cost estimate
- Relevance scores as visual progress bars
- Source document and page number
- Relevant chunk text previews
| Type | Extension | How indexed |
|---|---|---|
.pdf |
pdfplumber β preserves page numbers | |
| Text | .txt |
direct read |
| Markdown | .md |
direct read |
| Python | .py |
direct read β query your codebase |
| Wikipedia | wiki:Topic |
live API fetch |
Fine-tuning: expensive, slow, fixed knowledge
RAG: free to index, instant updates, always current
Documents are split into ~400 word chunks with 50 word overlap. Overlap ensures important context at chunk boundaries is never lost.
Follow-up questions like "Can you give an example?" are ambiguous without context. The reformulator uses conversation history to rewrite them as self-contained questions before searching Chroma β dramatically improving retrieval accuracy.
If no relevant chunks are found (similarity < 0.5), the agent refuses to answer rather than hallucinating from Claude's training data. This is the key safety mechanism in production RAG systems.
HuggingFace all-MiniLM-L6-v2 runs locally β no API calls, no cost, no rate limits. The same model must be used for both indexing and querying, otherwise the vector spaces don't align.
| Component | Model | Cost |
|---|---|---|
| Reformulation | Groq Llama 3.3 | $0.000 (free) |
| Retrieval | Chroma (local math) | $0.000 |
| Answer | Claude Sonnet | ~$0.013 |
| Total | ~$0.013 |
Indexing cost: $0.000 β HuggingFace embeddings run locally.
Switch LLM_ANSWER to anthropic/claude-haiku-4-5-20251001 for ~$0.001 per question during development.
| Service | URL | Free Tier |
|---|---|---|
| Anthropic (Claude) | console.anthropic.com | Pay as you go |
| Groq | console.groq.com | 30,000 tokens/min free |
All settings live in .env β no code changes needed:
# Switch answer model
LLM_ANSWER=anthropic/claude-haiku-4-5-20251001 # cheaper
LLM_ANSWER=anthropic/claude-sonnet-4-6 # smarter
# Tune retrieval
RETRIEVAL_TOP_K=3 # faster, cheaper
RETRIEVAL_TOP_K=5 # default
RETRIEVAL_TOP_K=8 # more context
# Tune quality threshold
QUALITY_THRESHOLD=0.3 # more permissive
QUALITY_THRESHOLD=0.5 # default
QUALITY_THRESHOLD=0.7 # stricter
# Tune chunk size
CHUNK_SIZE=200 # smaller, more precise
CHUNK_SIZE=400 # default
CHUNK_SIZE=600 # larger, more contextPass model and provider separately in config.py:
return init_chat_model(model=model, model_provider=provider)Check your SERPER_API_KEY if using web search, or try lowering QUALITY_THRESHOLD to 0.3.
Some PDFs are scanned images β pdfplumber can't extract text from images. Try OCR tools first to convert to text.
Delete the chroma_db/ folder and re-index:
rm -rf chroma_db/Use Python 3.11 or 3.12 β not 3.14:
pyenv install 3.12
pyenv local 3.12- OCR support for scanned PDFs
- DOCX and CSV support
- Re-index on document change detection
- Conversation export to markdown
- Multi-collection support (separate knowledge bases)
- Deploy to cloud (Railway / Render)
- LangGraph
StateGraphwith two independent pipelines - Vector databases and semantic search with Chroma
- How embeddings capture meaning as numbers
- Why chunking strategy matters for retrieval quality
- Question reformulation for conversational RAG
- Grounding LLM answers in documents via prompt design
- Quality gates to prevent hallucination
- Token management and cost optimisation in RAG
- Building production-grade Streamlit interfaces
- deep-learner β pure LangGraph learning agent (Month 1)
- learning-agent β CrewAI multi-agent learning system
MIT License β free to use, modify and distribute.