RAGForge – Hybrid Retrieval-Augmented Generation System

RAGForge is a document question-answering system built with FastAPI that retrieves relevant information from uploaded documents and generates answers using an LLM. The system combines lexical search and vector similarity retrieval, followed by reranking and contextual expansion to improve answer accuracy.

The project focuses on implementing a modular retrieval pipeline rather than a simple prompt-based chatbot.

Features

Multi-format document ingestion
- PDF
- DOCX
- TXT
Text chunking for document indexing
Hybrid retrieval pipeline
- BM25 lexical retrieval
- FAISS vector similarity search
Cross-encoder reranking to refine retrieved results
Sentence window expansion to include surrounding context
Query expansion for short user queries
Summary intent detection for queries like:
- summarize
- summary
- overview
Context-grounded answer generation using OpenAI models
Latency tracking for:
- retrieval
- reranking
- generation
- total request time
Retrieval evaluation module with metrics:
- Recall@K
- Mean Reciprocal Rank (MRR)
Simple web interface for:
- uploading documents
- asking questions

System Pipeline

Document Upload
      │
      ▼
Document Loader
(PDF / DOCX / TXT)
      │
      ▼
Text Chunking
      │
      ▼
Index Construction
 ├─ BM25 Index
 └─ FAISS Vector Index
      │
      ▼
User Query
      │
      ▼
Query Expansion
      │
      ▼
Hybrid Retrieval
(BM25 + Vector Search)
      │
      ▼
Cross-Encoder Reranking
      │
      ▼
Sentence Window Expansion
      │
      ▼
Context Selection
      │
      ▼
LLM Answer Generation

Tech Stack

Backend

FastAPI
Python

Retrieval

FAISS
rank-bm25
sentence-transformers

LLM

OpenAI API

Document Processing

PyPDF
python-docx

Frontend

HTML
CSS

Project Structure

RAGForge
│
├── app
│   ├── ingestion
│   │   ├── loader.py
│   │   ├── chunker.py
│   │   └── index_builder.py
│   │
│   ├── retrieval
│   │   ├── bm25.py
│   │   ├── vector_store.py
│   │   ├── hybrid.py
│   │   └── reranker.py
│   │
│   ├── generation
│   │   ├── answer_generator.py
│   │   └── prompt.py
│   │
│   ├── routes
│   │   ├── upload.py
│   │   └── ask.py
│   │
│   ├── evaluation
│   │   ├── evaluate.py
│   │   └── metrics.py
│   │
│   ├── templates
│   └── static
│
├── data/raw_docs
├── tests
├── run.py
└── requirements.txt

Installation

Clone repository

git clone https://github.com/omprakash0702/RAGForge.git
cd RAGForge

Create virtual environment

python -m venv venv

Activate environment

Windows

venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Set environment variable

OPENAI_API_KEY=your_api_key

Run the server

uvicorn app.main:app --reload

Open in browser

http://localhost:8000

Evaluation

The project includes a retrieval evaluation module.

Run evaluation:

python app/evaluation/evaluate.py

Metrics used:

Recall@K
Mean Reciprocal Rank (MRR)

Current Status

Hybrid retrieval pipeline implemented
Reranking and context expansion implemented
Multi-format document ingestion supported
Local system functioning
Cloud deployment currently being configured

Author

Om Prakash https://github.com/omprakash0702

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
data/raw_docs		data/raw_docs
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAGForge – Hybrid Retrieval-Augmented Generation System

Features

System Pipeline

Tech Stack

Project Structure

Installation

Evaluation

Current Status

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAGForge – Hybrid Retrieval-Augmented Generation System

Features

System Pipeline

Tech Stack

Project Structure

Installation

Evaluation

Current Status

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages