🤖 rag-agent

A pure LangGraph RAG (Retrieval Augmented Generation) agent that lets you ask questions about your own documents in natural language. Index PDFs, text files, markdown, Python files, and Wikipedia articles — then chat with them in a beautiful Streamlit interface or terminal.

Built from scratch with LangGraph, LangChain, Chroma, and HuggingFace embeddings. No hallucination — every answer is grounded in your documents with exact citations.

✨ Features

Multi-format indexing — PDF, TXT, MD, Python files, Wikipedia articles
Semantic search — finds content by meaning not just keywords
Grounded answers — Claude answers ONLY from your documents, never from training data
Exact citations — every answer cites source document and page number
Similarity scores — see how relevant each chunk is as a visual progress bar
Conversational memory — remembers previous questions for natural follow-ups
Question reformulation — automatically makes ambiguous follow-up questions self-contained
Quality gates — refuses to answer if no relevant content found (no hallucination)
Two interfaces — Streamlit web UI or terminal
Model agnostic — switch any LLM via .env, zero code changes
Free embeddings — HuggingFace runs locally, no API cost for indexing

🏗️ Architecture

Two independent pipelines

Indexing pipeline — runs once per document:

Document (PDF/TXT/MD/PY/Wiki)
  ↓
loader node        extracts raw text
  ↓
chunker node       splits into 400-word chunks with 50-word overlap
  ↓
store node         embeds with HuggingFace + saves to Chroma
  ↓
Chroma DB          persists to disk — survives restarts

Query pipeline — runs every question:

Your question + conversation history
  ↓
reformulator node  makes ambiguous questions self-contained
  ↓
retriever node     finds top 5 chunks by semantic similarity
  ↓
quality check      similarity score > 0.5?
  ↓                         ↓
answer node        no_answer node (honest fallback)
  ↓
display            answer + citations + similarity scores
  ↓
memory node        saves Q&A for next question's context

📁 Project Structure

rag-agent/
│
├── .env                   # API keys + model config
├── requirements.txt       # all dependencies
│
├── config.py              # model agnostic LLM + embedder getter
├── state.py               # IndexState + QueryState TypedDicts
├── nodes.py               # all node functions (pure LangChain)
├── edges.py               # routing and conditional logic
├── graph.py               # two StateGraphs assembled
│
├── streamlit_app.py       # Streamlit web interface
├── run.py                 # terminal interface
│
├── documents/             # add your files here (gitignored)
└── chroma_db/             # auto-created, persists between runs

⚙️ Setup

Prerequisites

Python 3.11 or 3.12
pyenv recommended for version management

1. Clone the repository

git clone https://github.com/yourusername/rag-agent.git
cd rag-agent

2. Set up Python version

pyenv install 3.12
pyenv local 3.12

3. Create virtual environment

python3 -m venv venv
source venv/bin/activate  # Mac/Linux
venv\Scripts\activate     # Windows

4. Install dependencies

pip install --upgrade pip
pip install -r requirements.txt

5. Configure environment variables

Create a .env file in the project root:

# API Keys
ANTHROPIC_API_KEY=your_anthropic_key_here
GROQ_API_KEY=your_groq_key_here

# Models — change freely, zero code changes needed
LLM_REFORMULATOR=groq/llama-3.3-70b-versatile
LLM_ANSWER=anthropic/claude-sonnet-4-6

# Embedding model (free, runs locally)
EMBEDDING_MODEL=all-MiniLM-L6-v2

# Chroma settings
CHROMA_PATH=./chroma_db
CHROMA_COLLECTION=rag_agent

# Retrieval settings
RETRIEVAL_TOP_K=5
QUALITY_THRESHOLD=0.5

# Chunk settings
CHUNK_SIZE=400
CHUNK_OVERLAP=50

6. Create documents folder

mkdir documents

Add any PDF, TXT, MD, or Python files you want to query.

🚀 Running

Option 1 — Streamlit web interface (recommended)

streamlit run streamlit_app.py

Opens automatically at http://localhost:8501

Option 2 — Terminal interface

python3 run.py

Available commands:

index <path>          Index a file
                      e.g. index documents/report.pdf

index wiki:<topic>    Index a Wikipedia article
                      e.g. index wiki:Machine Learning

ask <question>        Ask a question
                      e.g. ask What is backpropagation?

list                  Show all indexed documents
clear                 Clear conversation history
help                  Show all commands
exit                  Quit

🖥️ Streamlit Interface

The web interface has two panels:

Sidebar:

Drag and drop file upload OR type file path/wiki topic
Document library showing all indexed files with chunk counts
Delete individual documents
Session stats — total tokens used and estimated cost

Main area:

ChatGPT-style conversation interface
Streaming responses
Expandable metadata below each answer:
- Model used
- Token count and cost estimate
- Relevance scores as visual progress bars
- Source document and page number
- Relevant chunk text previews

🔍 Supported Document Types

Type	Extension	How indexed
PDF	`.pdf`	pdfplumber — preserves page numbers
Text	`.txt`	direct read
Markdown	`.md`	direct read
Python	`.py`	direct read — query your codebase
Wikipedia	`wiki:Topic`	live API fetch

🧠 How It Works

Why RAG instead of fine-tuning?

Fine-tuning:  expensive, slow, fixed knowledge
RAG:          free to index, instant updates, always current

Why chunking with overlap?

Documents are split into ~400 word chunks with 50 word overlap. Overlap ensures important context at chunk boundaries is never lost.

Why question reformulation?

Follow-up questions like "Can you give an example?" are ambiguous without context. The reformulator uses conversation history to rewrite them as self-contained questions before searching Chroma — dramatically improving retrieval accuracy.

Why quality gates?

If no relevant chunks are found (similarity < 0.5), the agent refuses to answer rather than hallucinating from Claude's training data. This is the key safety mechanism in production RAG systems.

Why HuggingFace for embeddings?

HuggingFace all-MiniLM-L6-v2 runs locally — no API calls, no cost, no rate limits. The same model must be used for both indexing and querying, otherwise the vector spaces don't align.

💰 Cost Per Question

Component	Model	Cost
Reformulation	Groq Llama 3.3	$0.000 (free)
Retrieval	Chroma (local math)	$0.000
Answer	Claude Sonnet	~$0.013
Total		~$0.013

Indexing cost: $0.000 — HuggingFace embeddings run locally.

Switch LLM_ANSWER to anthropic/claude-haiku-4-5-20251001 for ~$0.001 per question during development.

🔑 Getting API Keys

Service	URL	Free Tier
Anthropic (Claude)	console.anthropic.com	Pay as you go
Groq	console.groq.com	30,000 tokens/min free

⚙️ Configuration Reference

All settings live in .env — no code changes needed:

# Switch answer model
LLM_ANSWER=anthropic/claude-haiku-4-5-20251001   # cheaper
LLM_ANSWER=anthropic/claude-sonnet-4-6            # smarter

# Tune retrieval
RETRIEVAL_TOP_K=3       # faster, cheaper
RETRIEVAL_TOP_K=5       # default
RETRIEVAL_TOP_K=8       # more context

# Tune quality threshold
QUALITY_THRESHOLD=0.3   # more permissive
QUALITY_THRESHOLD=0.5   # default
QUALITY_THRESHOLD=0.7   # stricter

# Tune chunk size
CHUNK_SIZE=200           # smaller, more precise
CHUNK_SIZE=400           # default
CHUNK_SIZE=600           # larger, more context

🐛 Common Issues

`Unable to infer model provider`

Pass model and provider separately in config.py:

return init_chat_model(model=model, model_provider=provider)

Similarity scores always low

Check your SERPER_API_KEY if using web search, or try lowering QUALITY_THRESHOLD to 0.3.

PDF text extraction empty

Some PDFs are scanned images — pdfplumber can't extract text from images. Try OCR tools first to convert to text.

Chroma collection errors

Delete the chroma_db/ folder and re-index:

rm -rf chroma_db/

Python version errors

Use Python 3.11 or 3.12 — not 3.14:

pyenv install 3.12
pyenv local 3.12

🗺️ Roadmap

OCR support for scanned PDFs
DOCX and CSV support
Re-index on document change detection
Conversation export to markdown
Multi-collection support (separate knowledge bases)
Deploy to cloud (Railway / Render)

🧪 What I Learned Building This

LangGraph StateGraph with two independent pipelines
Vector databases and semantic search with Chroma
How embeddings capture meaning as numbers
Why chunking strategy matters for retrieval quality
Question reformulation for conversational RAG
Grounding LLM answers in documents via prompt design
Quality gates to prevent hallucination
Token management and cost optimisation in RAG
Building production-grade Streamlit interfaces

🔗 Related Projects

deep-learner — pure LangGraph learning agent (Month 1)
learning-agent — CrewAI multi-agent learning system

📄 License

MIT License — free to use, modify and distribute.

🙏 Acknowledgements

LangGraph — agent orchestration
LangChain — LLM integration
Chroma — vector database
HuggingFace — free local embeddings
Anthropic — Claude LLM API
Groq — fast free LLM inference
Streamlit — web interface framework

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
config.py		config.py
edges.py		edges.py
graph.py		graph.py
nodes.py		nodes.py
requirements.txt		requirements.txt
run.py		run.py
state.py		state.py
streamlit_app.py		streamlit_app.py

Folders and files

Latest commit

History

Repository files navigation

🤖 rag-agent

✨ Features

🏗️ Architecture

Two independent pipelines

📁 Project Structure

⚙️ Setup

Prerequisites

1. Clone the repository

2. Set up Python version

3. Create virtual environment

4. Install dependencies

5. Configure environment variables

6. Create documents folder

🚀 Running

Option 1 — Streamlit web interface (recommended)

Option 2 — Terminal interface

🖥️ Streamlit Interface

🔍 Supported Document Types

🧠 How It Works

Why RAG instead of fine-tuning?

Why chunking with overlap?

Why question reformulation?

Why quality gates?

Why HuggingFace for embeddings?

💰 Cost Per Question

🔑 Getting API Keys

⚙️ Configuration Reference

🐛 Common Issues

Unable to infer model provider

Similarity scores always low

PDF text extraction empty

Chroma collection errors

Python version errors

🗺️ Roadmap

🧪 What I Learned Building This

🔗 Related Projects

📄 License

🙏 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`Unable to infer model provider`

Packages