Retrieval-Augmented Generation (RAG) for Small Language Models (SLMs)

This repository implements a Retrieval-Augmented Generation (RAG) pipeline designed to work efficiently with local Small Language Models (SLMs) as well as standard language models.

The complete workflow is demonstrated in rag_demo.ipynb.

System Architecture

The system consists of two phases:

1. Offline Preparation Phase

Documents are processed once and stored for efficient retrieval:

PDF file(s) → Load → Text Chunking → Text Embedding → Store in Vector DB (FAISS)

2. Online Inference Phase

User queries are answered using retrieved document context:

User Query → Query Embedding → Vector Retrieval (FAISS) → Prompt Construction (Query + Retrieved Chunks) → SLM / LLM → Generated Answer

Project Structure

$ tree
.
├── rag_demo.ipynb
├── data
│   ├── document.md
│   └── document.pdf
├── requirements.txt
└── src
    ├── chunking.py
    ├── document_loader.py
    ├── embedding.py
    ├── lm.py
    ├── prompt_constructor.py
    └── vector_store.py

Note: PDF content is extracted into a Markdown (.md) file rather than plain text (.txt), as Markdown can preserve more structure and be more informative for language models.
See the example: data/document.md

Setup & Usage

1- Install dependencies

pip install -r requirements.txt

2- Hugging Face authentication

Provide an access token to use Hugging Face models.

You may create a token at: https://huggingface.co/settings/tokens

Then run the following command and enter the token:

huggingface-cli login

3- Run the pipeline

Place PDF documents in the data directory.
Run the offline indexing pipeline to build the vector database.
Query the system using the online inference pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
rag_demo.ipynb		rag_demo.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retrieval-Augmented Generation (RAG) for Small Language Models (SLMs)

System Architecture

1. Offline Preparation Phase

2. Online Inference Phase

Project Structure

Setup & Usage

1- Install dependencies

2- Hugging Face authentication

3- Run the pipeline

About

Uh oh!

Releases 1

Contributors 2

Uh oh!

Languages

License

AINexumLab/slm-rag

Folders and files

Latest commit

History

Repository files navigation

Retrieval-Augmented Generation (RAG) for Small Language Models (SLMs)

System Architecture

1. Offline Preparation Phase

2. Online Inference Phase

Project Structure

Setup & Usage

1- Install dependencies

2- Hugging Face authentication

3- Run the pipeline

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors 2

Uh oh!

Languages