VRAG - Vision-RAG Document Retrieval Lab

VRAG is an exploratory document intelligence lab for OCR-free Vision-RAG, page-level retrieval, and multimodal PDF understanding.

The goal is to make document retrieval work on files where text-only RAG breaks down: scanned PDFs, tables, layout-heavy reports, charts, forms, and pages where meaning depends on visual structure.

What VRAG Means

Vision-RAG retrieves and reasons over document pages as visual artifacts, not just extracted text chunks. Instead of treating OCR as the first and most important step, the pipeline can embed rendered pages and retrieve the pages most relevant to a query.

This repo currently contains an exploratory ColQwen2 / ColPali-style notebook export. It is a lab, not a finished benchmark suite.

Why OCR-First RAG Fails

OCR-first pipelines are useful, but they often lose information that matters:

Tables become fragmented text.
Multi-column layouts are read in the wrong order.
Charts and diagrams are underrepresented or ignored.
Form labels can detach from their values.
Scanned or low-quality PDFs introduce extraction noise before retrieval starts.

Vision-RAG keeps the page image in the loop so retrieval can use layout, visual grouping, and text appearance alongside language.

Current Contents

colqwen2_vrag.ipynb - exploratory notebook for ColQwen2-style page retrieval and generation.
colqwen2_vrag.py - Python export of the notebook for easier inspection in GitHub.

Focus Areas

PDF page rendering and page-level retrieval.
OCR-free or OCR-light document retrieval.
ColPali / ColQwen-style late-interaction embeddings.
Retrieval over tables, charts, scanned pages, and visual layouts.
Evaluation notes for when Vision-RAG helps compared with text-only RAG.

Local Notes

The current notebook is GPU-oriented and was originally explored in Colab. Running it locally may require:

Python 3.10+
PyTorch with GPU support
colpali-engine
transformers
pdf2image
Poppler utilities for PDF rendering

The exact environment should be documented before treating this as a reproducible pipeline.

Roadmap

Add an OCR-first vs Vision-RAG comparison note.
Add a small sample PDF page retrieval pipeline.
Add ColPali / ColQwen reading notes with links to the relevant papers and model cards.
Separate reusable retrieval code from notebook experiments.
Add a lightweight evaluation worksheet for retrieval quality.

Status

Exploratory lab / secondary repo. This is not a finished benchmark suite or production system. My main proof-of-work currently lives in Stateframe, PayRail, AI Systems Notes, and my portfolio.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
colqwen2_vrag.ipynb		colqwen2_vrag.ipynb
colqwen2_vrag.py		colqwen2_vrag.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VRAG - Vision-RAG Document Retrieval Lab

What VRAG Means

Why OCR-First RAG Fails

Current Contents

Focus Areas

Local Notes

Roadmap

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VRAG - Vision-RAG Document Retrieval Lab

What VRAG Means

Why OCR-First RAG Fails

Current Contents

Focus Areas

Local Notes

Roadmap

Status

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages