This repository implements a Retrieval-Augmented Generation (RAG) pipeline designed to work efficiently with local Small Language Models (SLMs) as well as standard language models.
The complete workflow is demonstrated in rag_demo.ipynb.
The system consists of two phases:
Documents are processed once and stored for efficient retrieval:
PDF file(s) → Load → Text Chunking → Text Embedding → Store in Vector DB (FAISS)
User queries are answered using retrieved document context:
User Query → Query Embedding → Vector Retrieval (FAISS) → Prompt Construction (Query + Retrieved Chunks) → SLM / LLM → Generated Answer
$ tree
.
├── rag_demo.ipynb
├── data
│ ├── document.md
│ └── document.pdf
├── requirements.txt
└── src
├── chunking.py
├── document_loader.py
├── embedding.py
├── lm.py
├── prompt_constructor.py
└── vector_store.py
Note: PDF content is extracted into a Markdown (.md) file rather than plain text (.txt), as Markdown can preserve more structure and be more informative for language models.
See the example: data/document.md
pip install -r requirements.txtProvide an access token to use Hugging Face models.
You may create a token at: https://huggingface.co/settings/tokens
Then run the following command and enter the token:
huggingface-cli login- Place PDF documents in the data directory.
- Run the offline indexing pipeline to build the vector database.
- Query the system using the online inference pipeline.