🔎 Multimodal RAG System

A Python-based Retrieval-Augmented Generation pipeline that understands text, images, and tables — powered by OCR, FAISS, and Groq LLaMA 3.

What is this?

Most RAG systems only handle plain text. This one goes further — it ingests PDFs, Word docs, CSVs, Excel files, and even images embedded inside PDFs (via Tesseract OCR). Everything gets chunked, embedded with Google Generative AI, stored in FAISS, and queried through Groq LLaMA 3 for grounded answers with source attribution.

If your answer came from an image, you get the image path back too.

✨ Features

📄 PDF & DOCX text extraction — direct parsing from PDF and Word documents
🖼️ OCR for embedded images — extracts text from images inside PDFs using Tesseract, preserving image paths
📊 Table parsing — CSV & Excel files converted to searchable text via pandas
🔗 Chunking & embedding — powered by Google Generative AI Embeddings
⚡ FAISS vector search — fast similarity-based retrieval
🤖 Groq LLaMA 3 QA — generates grounded responses from retrieved context
🖼️ Image path return — includes the source image path when answers originate from OCR

🏗️ Architecture

 Documents (PDF, DOCX, CSV, Excel)
          │
          ▼
 ┌────────────────────────┐
 │   Content Extraction   │  ← PyMuPDF, python-docx, pandas, Tesseract OCR
 └──────────┬─────────────┘
            ▼
 ┌────────────────────────┐
 │  Chunk + Embed         │  ← Google Generative AI Embeddings + LangChain
 └──────────┬─────────────┘
            ▼
 ┌────────────────────────┐
 │  FAISS Vector Store    │  ← Similarity search & retrieval
 └──────────┬─────────────┘
            ▼
 ┌────────────────────────┐
 │  Groq LLaMA 3          │  ← Answer generation + source attribution
 └────────────────────────┘

🚀 Getting Started

Prerequisites: Python 3.11+, Tesseract OCR, a Google API key, and a Groq API key.

# 1. Clone
git clone https://github.com/AnithaKarre/multimodel_RAG.git
cd multimodel_RAG

# 2. Virtual environment
python -m venv .venv
.venv\Scripts\activate          # Windows
# source .venv/bin/activate     # Linux/macOS

# 3. Install dependencies
pip install -r requirements.txt

# 4. Set Tesseract path in backend/rag-model.py
#    pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# 5. Create backend/assets/.env
#    EMBED_MODEL_API_KEY=your_google_api_key
#    GROQ_API_KEY=your_groq_api_key

# 6. Run
python backend/rag-model.py

Drop your documents (PDF, DOCX, CSV, Excel) into backend/assets/ before running.

📂 Project Structure

multimodel_RAG/
├── main.py                # Entry point (extendable for UI/API)
├── requirements.txt       # Python dependencies
├── pyproject.toml         # Project configuration
├── uv.lock                # Lock file for reproducible installs
├── backend/
│   ├── rag-model.py       # Main RAG pipeline
│   ├── app.ipynb          # Jupyter notebook for experiments
│   └── assets/            # Input documents + .env file
└── images/                # Extracted images from PDFs (with OCR text)

🛠️ Tech Stack

Core: Python 3.11+ · LangChain · FAISS · Groq LLaMA 3

Document Processing: PyMuPDF (fitz) · python-docx · pandas · pytesseract + PIL

Embeddings: Google Generative AI Embeddings

🔄 How It Works

1. Load documents from backend/assets/
2. Extract text (PDF/DOCX) + parse tables (CSV/Excel)
3. Extract images from PDFs → OCR with Tesseract → save to images/
4. Chunk all extracted text with LangChain splitters
5. Embed chunks → index in FAISS
6. User asks a question
7. Query embedded → similarity search in FAISS
8. Top chunks fed to Groq LLaMA 3 → grounded answer returned
9. If answer came from OCR text → image path included in response

📜 License

MIT — See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔎 Multimodal RAG System

What is this?

✨ Features

🏗️ Architecture

🚀 Getting Started

📂 Project Structure

🛠️ Tech Stack

🔄 How It Works

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
backend		backend
images		images
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

🔎 Multimodal RAG System

What is this?

✨ Features

🏗️ Architecture

🚀 Getting Started

📂 Project Structure

🛠️ Tech Stack

🔄 How It Works

📜 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages