This repository contains:
ragent-backend/– Python FastAPI serviceragent-frontend/– React/Vite UI clientdocs/– supplemental markdown for architecture and design decisionsSee docs/architecture.md for a deep dive and docs/design_decisions.md for rationale behind technical choices.
- Overview
- Architecture
- Project Structure
- Agent Roles & Reasoning Flow
- System Setup
- API Reference
- Configuration Reference
- Deployment Guide
- Limitations & Challenges
RAgent is an AI-powered document knowledge base enabling retrieval-augmented question answering from uploaded PDFs, TXT, CSV, and Excel files.
Core components:
- LangChain – orchestration framework for agents, chains, loaders, and splitters
- RAG (Retrieval-Augmented Generation) – ensures answers are grounded in document content
- Agentic AI – multi-step reasoning using tool-calling agents
- ChromaDB – local persistent vector database for semantic search
- Groq GPT – LLM backbone for reasoning and answer generation
- FastAPI – async REST API layer
Users upload documents and query the system in natural language. The system retrieves relevant chunks and generates grounded, citation-backed responses.
┌───────────────────┐
│ │
│ Frontend SPA │
│ (React + Vite) │
│ │
└───────────────────┘
|
┌───────────────────────────▼──────────────────────────────────┐
│ FastAPI REST API (API gateway) │
│ POST /documents/upload GET /documents/ POST /query/rag │
│ GET /documents/{f} GET /health POST /query/agent│
│ DELETE /documents/{f} GET /documents/{f}/download │
└────────────┬─────────────────────┬───────────────────────────┘
│ │
┌───────▼──────┐ ┌────────▼────────────────────────┐
│ Ingestion │ │ Query Layer │
│ Pipeline │ │ │
│ │ │ ┌──────────┐ ┌─────────────┐ │
│ 1. Load doc │ │ │ RAG │ │ Agent │ │
│ (LangChain│ │ │ Chain │ │ Executor │ │
│ loaders) │ │ │ (LCEL) │ │ (Tool-call)│ │
│ 2. Chunk │ │ └────┬─────┘ └──────┬──────┘ │
│ (RecChar │ │ │ │ │
│ Splitter) │ │ └────────┬───────┘ │
│ 3. Embed & │ │ │ │
│ Store │ │ ┌────────▼──────────┐ │
└──────┬───────┘ │ │ Retrieval Layer | │
│ │ │ (ChromaDB + │ │
└─────────────┼──────►│ HuggingFace │ │
│ │ Embeddings) │ │
│ └────────┬──────────┘ │
│ │ │
│ ┌────────▼──────────┐ │
│ │ LLM Layer │ │
│ │ (GPT-OSS-120B │ │
│ │ via LangChain- │ │
│ │ Groq) │ │
│ │ │ │
│ └───────────────────┘ │
└─────────────────────────────────┘
| Decision | Choice | Reason |
|---|---|---|
| Vector DB | ChromaDB | Local persistent store, no extra infra needed |
| Embeddings | all-MiniLM-L6-v2 (HuggingFace) |
Fast, free, runs on CPU |
| LLM | Groq (openai/gpt-oss-120b) | Ultra-fast inference, free tier, OpenAI-compatible |
| Agent type | Tool-calling agent (LangChain) | Native tool-use API, clean intermediate steps |
| Chain style | LCEL (LangChain Expression Language) | Composable, readable, type-safe |
rag_agent/
├── main.py # FastAPI app, router registration, global error handler
├── config.py # All settings loaded from .env
├── requirements.txt # Python dependencies
├── .env.example # Environment variable template
├── DOCUMENTATION.md # This file
│
├── app/
│ ├── ingestion/
│ │ ├── loader.py # LangChain document loaders (PDF/TXT/CSV/Excel)
│ │ └── chunker.py # RecursiveCharacterTextSplitter wrapper
│ │
│ ├── retrieval/
│ │ └── vector_store.py # ChromaDB + HuggingFace embeddings (CRUD + search)
│ │
│ ├── rag/
│ │ └── pipeline.py # LCEL RAG chain (retrieve → prompt → LLM → parse)
│ │
│ ├── agents/
│ │ ├── tools.py # LangChain @tool definitions (retrieve, rag, list)
│ │ └── rag_agent.py # AgentExecutor with tool-calling, multi-turn support
│ │
│ ├── routes/
│ │ ├── health.py # GET /health
│ │ ├── documents.py # POST/GET/DELETE /documents/
│ │ └── query.py # POST /query/rag POST /query/agent
│ │
│ └── utils/
│ ├── logger.py # Centralised logging
│ └── validators.py # File & query validation / guardrails
│
├── uploads/ # Uploaded files stored here
└── chroma_db/ # ChromaDB persistence directory
- Role: Plans which tools to call based on the user's question
- Model: GPT (via
Groq) - Framework:
create_tool_calling_agent+AgentExecutor - Max iterations: 6 (safety cap to prevent infinite loops)
| Tool | Purpose |
|---|---|
retrieve_documents |
Fetch raw relevant passages from ChromaDB |
list_uploaded_documents |
Show what documents are in the knowledge base |
fetch_chunks_by_page |
Gets all chunks of a particular page |
fetch_chunks_by_index |
Gets a particular chunk by its index |
User Question
│
▼
Agent Plans → calls retrieve_documents(query)
│
▼
Agent Reviews chunks → calls answer_with_rag(query)
│ (for complex questions: may call retrieve again)
▼
Agent Validates answer (is it grounded? complete?)
│
▼
Final Answer + Steps + Tools Used → returned to API caller
/query/rag |
/query/agent |
|
|---|---|---|
| Reasoning | Fixed 1-step pipeline | Multi-step autonomous |
| Speed | Faster | Slower (more LLM calls) |
| Transparency | Source list | Full step trace |
| Best for | Simple factual queries | Complex multi-hop questions |
Before any setup can be done make sure to clone the project into your local machine.
# 1. Clone / extract the project
cd ragent-frontend
# 2. Install dependencies
npm install
# 3. Start the server
npm run dev- Python 3.11+
- Groq API key
# 1. Clone / extract the project
cd ragent-backend
# 2. Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Configure environment
cp .env.example .env
# Edit .env and set ANTHROPIC_API_KEY=your_key_here
# 5. Start the server
uvicorn main:app --host 0.0.0.0 --port 8000 --reloadOpen http://localhost:8000/docs for the interactive Swagger UI.
Returns system status and knowledge-base statistics.
Response
{
"status": "ok",
"llm_model": "openai/gpt-oss-120b",
"embedding_model": "all-MiniLM-L6-v2",
"knowledge_base": { "total_chunks": 42, "documents": ["report.pdf"] }
}Upload and ingest a document into the knowledge base.
Request: multipart/form-data with field file.
Supported formats: pdf, txt, csv, xlsx, xls
Response
{
"message": "Document ingested successfully.",
"filename": "report.pdf",
"pages_loaded": 12,
"chunks_stored": 47
}List all documents in the knowledge base.
Response
{
"total_chunks": 47,
"documents": ["report.pdf", "data.csv"]
}Remove a document and all its chunks from the knowledge base.
Response
{
"message": "Document removed successfully.",
"filename": "report.pdf",
"chunks_removed": 47
}Direct RAG query — fast, fixed pipeline.
Request
{ "question": "What were the Q3 revenue figures?" }Response
{
"answer": "According to the uploaded report, Q3 revenue was $4.2M...",
"sources": [{ "filename": "report.pdf", "chunk_index": 12 }]
}Agentic query — autonomous multi-step reasoning.
Request
{
"question": "Compare the risk factors in the 2023 and 2024 annual reports.",
"chat_history": [
{ "role": "human", "content": "I uploaded both annual reports." },
{ "role": "ai", "content": "Great, I can see them in the knowledge base." }
]
}Response
{
"answer": "Based on both reports, the 2024 filing highlights...",
"tools_used": ["retrieve_documents", "answer_with_rag"],
"steps": [
{
"tool": "retrieve_documents",
"input": "risk factors 2023",
"output_preview": "[Chunk 1 | Source: 2023_annual.pdf]\nRisk factor..."
}
]
}| Variable | Default | Description |
|---|---|---|
GROQ_API_KEY |
(required) | Your Groq API key (free at console.groq.com) |
LLM_MODEL |
openai/gpt-oss-120b |
Groq model to use |
EMBEDDING_MODEL |
all-MiniLM-L6-v2 |
Sentence-transformer model |
UPLOAD_DIR |
./uploads |
Where uploaded files are saved |
CHROMA_PERSIST_DIR |
./chroma_db |
ChromaDB persistence directory |
CHROMA_COLLECTION_NAME |
enterprise_docs |
ChromaDB collection name |
CHUNK_SIZE |
800 |
Characters per chunk |
CHUNK_OVERLAP |
150 |
Overlap between adjacent chunks |
TOP_K_RESULTS |
5 |
Chunks retrieved per query |
MAX_FILE_SIZE_MB |
50 |
Maximum upload size |
ALLOWED_EXTENSIONS |
pdf,txt,csv,xlsx,xls |
Permitted file types |
DEBUG |
false |
Enable verbose agent logging |
uvicorn main:app --host 0.0.0.0 --port 8000 --reloadpip install gunicorn
gunicorn main:app -k uvicorn.workers.UvicornWorker -w 2 -b 0.0.0.0:8000FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]docker build -t rag-agent .
docker run -e GROQ_API_KEY=your_key -p 8000:8000 rag-agentNever commit .env to source control. Use:
- Docker
--env-fileor-eflags - Cloud provider secret managers (AWS Secrets Manager, GCP Secret Manager)
- Platform environment variable settings (Railway, Render, Fly.io)
| Area | Limitation |
|---|---|
| Memory | ChromaDB is local; not suited for multi-server deployments without a shared volume |
| Embeddings | all-MiniLM-L6-v2 is English-optimised; multilingual docs may have lower recall |
| File size | Very large documents (>50 MB) increase ingestion time significantly |
| Tables | Complex multi-column tables in PDFs may not parse cleanly with PyPDF |
| Images | Image content within PDFs (charts, diagrams) is not extracted |
| Context window | Long documents are chunked; cross-chunk reasoning requires agent multi-hop calls |
| Concurrency | Singleton vector store may have race conditions under heavy parallel writes |
- Prompt injection – Mitigated with query validation that rejects known injection patterns.
- Hallucination – System prompt strictly instructs the LLM to only use provided context. Temperature is set to 0.1–0.2 for factual grounding.
- Chunk boundary splitting – Answers sometimes span two chunks. Solved with
CHUNK_OVERLAP=150to ensure no information is lost at boundaries. - Agent infinite loops – Mitigated by setting
max_iterations=6on the AgentExecutor. - Cold start latency – Embedding model download (~90 MB) on first run. Subsequent starts use the cached model.
- Add a re-ranking step (cross-encoder) after initial retrieval for higher precision
- Support for tables and OCR for scanned PDF documents
- Add authentication (API key or OAuth2) to the FastAPI routes
- Implement query caching to avoid redundant LLM calls for repeated questions
- Add a
/query/streamendpoint using Server-Sent Events for streaming responses