🧠 RAG API

A Retrieval-Augmented Generation (RAG) system. This API allows users to ingest PDF documents and ask context-aware questions using LLM strategies.

🚀 Key Features

RAG Pipeline: Ingests, chunks, embeds, and retrieves document context.
Multi-Provider Strategy: Implements a Fallback Pattern for high availability:
- Embedding Models
  - Primary: OpenAI (text-embedding-3-small)
  - Fallback: Gemini (text-embedding-004)
- LLM Models
  - Primary: OpenAI (GPT-3.5-Turbo)
  - Fallback: Google Gemini (Gemini Pro)
Vector Persistence: Uses ChromaDB with persistent storage.
Observability: Colored logging system for improved debugging.
Clean Architecture: Modular separation of concerns.
Type Safety: 100% typed code with mypy strict mode.

🏗️ Architecture

The project structure is inspired by the FastAPI Reference App patterns to ensure scalability.

backend/
├── app/                 # API Layer (FastAPI Routers & Schemas)
├── core/                # Business Logic (Pipelines for Ingestion & RAG)
├── ChromaDB/            # Vector Database Manager
│   ├── src/embeddings   # Embedding Strategies (OpenAI / Gemini)
├── LLM/                 # LLM Manager (Strategy Pattern & Factory)
│   ├── prompts/         # YAML Prompts
│   ├── src/models       # OpenAI / Gemini Implementations
└── logs/                # Custom Logger

🛠️ Getting Started

Prerequisites

Docker & Docker Compose (recommended)
Python 3.10+ (for local execution)

1. Configuration

Create a .env file in the root directory. You can use the .env.example file in the root directory as a template:

# Keys
OPENAI_API_KEY=
GOOGLE_API_KEY=

# API configuration
API_HOST=localhost
API_PORT=8000

# Chroma DB
CHROMA_DB_PATH=backend/ChromaDB/DB_instance

# Embedding
OPENAI_EMBED_MODEL_NAME=text-embedding-3-small
GOOGLE_EMBED_MODEL=models/text-embedding-004

# LLM
OPENAI_LLM_MODEL_NAME=gpt-3.5-turbo
GOOGLE_LLM_MODEL_NAME=gemini-2.5-flash

# RAG Configs
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
LLM_TEMPERATURE=0.4
RAG_RETRIEVAL_COUNT=5

2. Running with Docker (Recommended)

This ensures all system dependencies are correctly installed.

docker-compose up --build

The API will be available at: http://localhost:8000/docs

3. Running Locally

If running directly on the host machine:

pip install -r requirements.txt
python app.py

📡 Usage

You can interact with the API via the Swagger UI or using curl.

1. Upload Documents

Uploads and processes a PDF file.

curl -X POST "http://localhost:8000/documents" \
  -H "accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -F "files=@yourfile.pdf"

2. Ask a Question

Performs semantic search and generates an answer using the LLM.

curl -X POST "http://localhost:8000/question" \
  -H "Content-Type: application/json" \
  -d '{ "question": "What is the power consumption?" }'

3. List all Documents

Returns metadata of all indexed files.

curl -X GET "http://localhost:8000/documents" \
  -H "Content-Type: application/json"

4. Delete a Document

Removes all chunks associated with a specific file.

curl -X DELETE \
  "http://localhost:8000/documents/{file_name}" \
  -H "accept: application/json"

5. Health Check

curl -X GET \
  "http://localhost:8000/health" \
  -H "accept: application/json"

🧠 Technical Decisions & Trade-offs

1. Vector Database: ChromaDB (Local Persistence)

Decision: Used ChromaDB in persistent mode via Docker Volumes.

Why: It eliminates the operational overhead of maintaining external VMs or managed vector services for a standalone challenge while keeping the infrastructure portable and self-contained.

2. Concurrency Control: Singleton Pattern

Decision: The Database Coordinator (ChromaManager) is instantiated as a strict Singleton using @lru_cache.

Why: Since we are running ChromaDB locally, multiple uncoordinated instances trying to write to the same directory could lead to race conditions or database locks. The Singleton pattern ensures a single point of entry for all DB operations.

3. Resilience: Strategy Pattern & Fallbacks

Decision: Implemented Abstract Factories for both Embeddings and LLMs.

Why: To ensure high availability.
- Embedding: Tries OpenAI first; if it fails (Auth/Connection), falls back to Google Gemini model (text-embedding-004).
- Generation: Tries OpenAI (gpt-3.5) first; if it fails, falls back to Google Gemini (gemini-pro).

4. Configuration: Pydantic Settings

Decision: All configuration is managed via pydantic-settings.

Why: It provides type validation for environment variables.

5. Modular Architecture Style

Decision: To maintain clear separation between the API Interface (Routers), Business Logic (Services/Pipelines), Data Access Layers and LLM Module. A module only knows the interface of another module, not its implementation.

🧪 Quality Assurance

To ensure code quality, a linting pipeline is included. It runs:

Isort: Sorts imports.
Black: Formats code.
Flake8: Enforces style.
MyPy: Checks static types.

Run locally via PowerShell:

./lint.ps1

🔮 Future Improvements

Async background workers for ingestion
RAG Evaluation: Ragas / TruLens
Persistent log files for later comparisons and improvement tracking.
Custom and General Exception Handlers

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
backend		backend
logs		logs
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
Relatorio		Relatorio
app.py		app.py
docker-compose.yml		docker-compose.yml
lint.ps1		lint.ps1
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
settings.py		settings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 RAG API

🚀 Key Features

🏗️ Architecture

🛠️ Getting Started

Prerequisites

1. Configuration

2. Running with Docker (Recommended)

3. Running Locally

📡 Usage

1. Upload Documents

2. Ask a Question

3. List all Documents

4. Delete a Document

5. Health Check

🧠 Technical Decisions & Trade-offs

1. Vector Database: ChromaDB (Local Persistence)

2. Concurrency Control: Singleton Pattern

3. Resilience: Strategy Pattern & Fallbacks

4. Configuration: Pydantic Settings

5. Modular Architecture Style

🧪 Quality Assurance

🔮 Future Improvements

About

Uh oh!

Releases

Packages

Languages

EduuF/ChatPDF-RAG-Python

Folders and files

Latest commit

History

Repository files navigation

🧠 RAG API

🚀 Key Features

🏗️ Architecture

🛠️ Getting Started

Prerequisites

1. Configuration

2. Running with Docker (Recommended)

3. Running Locally

📡 Usage

1. Upload Documents

2. Ask a Question

3. List all Documents

4. Delete a Document

5. Health Check

🧠 Technical Decisions & Trade-offs

1. Vector Database: ChromaDB (Local Persistence)

2. Concurrency Control: Singleton Pattern

3. Resilience: Strategy Pattern & Fallbacks

4. Configuration: Pydantic Settings

5. Modular Architecture Style

🧪 Quality Assurance

🔮 Future Improvements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages