A Retrieval-Augmented Generation (RAG) system. This API allows users to ingest PDF documents and ask context-aware questions using LLM strategies.
- RAG Pipeline: Ingests, chunks, embeds, and retrieves document context.
- Multi-Provider Strategy: Implements a Fallback Pattern for high availability:
- Embedding Models
- Primary: OpenAI (text-embedding-3-small)
- Fallback: Gemini (text-embedding-004)
- LLM Models
- Primary: OpenAI (GPT-3.5-Turbo)
- Fallback: Google Gemini (Gemini Pro)
- Embedding Models
- Vector Persistence: Uses ChromaDB with persistent storage.
- Observability: Colored logging system for improved debugging.
- Clean Architecture: Modular separation of concerns.
- Type Safety: 100% typed code with
mypystrict mode.
The project structure is inspired by the FastAPI Reference App patterns to ensure scalability.
backend/
├── app/ # API Layer (FastAPI Routers & Schemas)
├── core/ # Business Logic (Pipelines for Ingestion & RAG)
├── ChromaDB/ # Vector Database Manager
│ ├── src/embeddings # Embedding Strategies (OpenAI / Gemini)
├── LLM/ # LLM Manager (Strategy Pattern & Factory)
│ ├── prompts/ # YAML Prompts
│ ├── src/models # OpenAI / Gemini Implementations
└── logs/ # Custom Logger
- Docker & Docker Compose (recommended)
- Python 3.10+ (for local execution)
Create a .env file in the root directory. You can use the .env.example file in the root directory as a template:
# Keys
OPENAI_API_KEY=
GOOGLE_API_KEY=
# API configuration
API_HOST=localhost
API_PORT=8000
# Chroma DB
CHROMA_DB_PATH=backend/ChromaDB/DB_instance
# Embedding
OPENAI_EMBED_MODEL_NAME=text-embedding-3-small
GOOGLE_EMBED_MODEL=models/text-embedding-004
# LLM
OPENAI_LLM_MODEL_NAME=gpt-3.5-turbo
GOOGLE_LLM_MODEL_NAME=gemini-2.5-flash
# RAG Configs
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
LLM_TEMPERATURE=0.4
RAG_RETRIEVAL_COUNT=5
This ensures all system dependencies are correctly installed.
docker-compose up --build
The API will be available at: http://localhost:8000/docs
If running directly on the host machine:
pip install -r requirements.txt
python app.py
You can interact with the API via the Swagger UI or using curl.
Uploads and processes a PDF file.
curl -X POST "http://localhost:8000/documents" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "files=@yourfile.pdf"
Performs semantic search and generates an answer using the LLM.
curl -X POST "http://localhost:8000/question" \
-H "Content-Type: application/json" \
-d '{ "question": "What is the power consumption?" }'
Returns metadata of all indexed files.
curl -X GET "http://localhost:8000/documents" \
-H "Content-Type: application/json"
Removes all chunks associated with a specific file.
curl -X DELETE \
"http://localhost:8000/documents/{file_name}" \
-H "accept: application/json"
curl -X GET \
"http://localhost:8000/health" \
-H "accept: application/json"
Decision: Used ChromaDB in persistent mode via Docker Volumes.
- Why: It eliminates the operational overhead of maintaining external VMs or managed vector services for a standalone challenge while keeping the infrastructure portable and self-contained.
Decision: The Database Coordinator (ChromaManager) is instantiated as a strict Singleton using @lru_cache.
- Why: Since we are running ChromaDB locally, multiple uncoordinated instances trying to write to the same directory could lead to race conditions or database locks. The Singleton pattern ensures a single point of entry for all DB operations.
Decision: Implemented Abstract Factories for both Embeddings and LLMs.
- Why: To ensure high availability.
- Embedding: Tries OpenAI first; if it fails (Auth/Connection), falls back to Google Gemini model (
text-embedding-004). - Generation: Tries OpenAI (
gpt-3.5) first; if it fails, falls back to Google Gemini (gemini-pro).
- Embedding: Tries OpenAI first; if it fails (Auth/Connection), falls back to Google Gemini model (
Decision: All configuration is managed via pydantic-settings.
- Why: It provides type validation for environment variables.
Decision: To maintain clear separation between the API Interface (Routers), Business Logic (Services/Pipelines), Data Access Layers and LLM Module. A module only knows the interface of another module, not its implementation.
To ensure code quality, a linting pipeline is included. It runs:
- Isort: Sorts imports.
- Black: Formats code.
- Flake8: Enforces style.
- MyPy: Checks static types.
Run locally via PowerShell:
./lint.ps1
- Async background workers for ingestion
- RAG Evaluation: Ragas / TruLens
- Persistent log files for later comparisons and improvement tracking.
- Custom and General Exception Handlers