A curated collection of papers, frameworks, tools, and resources on Retrieval-Augmented Generation (RAG).
Maintained for students of the Text Mining and Data Visualization course as a starting point for thesis research.
Retrieval-Augmented Generation is a technique that enhances Large Language Models (LLMs) by grounding their responses in external knowledge retrieved at inference time, reducing hallucinations and enabling domain-specific answers without fine-tuning.
- Foundational Papers
- Survey Papers
- Advanced Techniques
- Frameworks and Libraries
- Vector Databases
- Embedding Models
- Tutorials and Guides
- Videos and Talks
- Datasets and Benchmarks
- Contributing
- License
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (2020) - The original RAG paper by Lewis et al. (Meta AI). Introduces the RAG architecture combining a pre-trained seq2seq model with a dense retriever (DPR).
- Dense Passage Retrieval for Open-Domain Question Answering (2020) - DPR — the dense retrieval method that underpins many RAG systems.
- REALM: Retrieval-Augmented Language Model Pre-Training (2020) - Pre-trains a language model jointly with a knowledge retriever.
- Attention Is All You Need (2017) - The Transformer architecture — foundational to all modern LLMs used in RAG.
- Retrieval-Augmented Generation for Large Language Models: A Survey (2023) - Comprehensive survey covering Naive RAG, Advanced RAG, and Modular RAG paradigms. Excellent starting point.
- A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models (2024) - Covers the evolution of RA-LLMs, taxonomies, and training strategies.
- Seven Failure Points When Engineering a RAG System (2024) - Practical guide to what can go wrong in RAG pipelines — highly recommended for thesis work.
- Unstructured - Pre-processing library for parsing PDFs, HTML, Word docs into clean chunks.
- Semantic Chunking - Splitting documents based on semantic similarity rather than fixed token windows.
- Hierarchical Indexing - Using summaries at different granularity levels (document → section → paragraph) to improve retrieval precision.
- Parent-Child Chunking - Retrieve small chunks for precision, but pass the parent (larger) chunk to the LLM for context.
| Strategy | Description |
|---|---|
| Dense Retrieval | Encode queries and documents into vector embeddings, retrieve by cosine similarity. |
| Sparse Retrieval (BM25) | Traditional keyword-based retrieval. Still competitive and often used as a baseline. |
| Hybrid Search | Combine dense + sparse retrieval (e.g., via Reciprocal Rank Fusion). Often outperforms either alone. |
| Multi-Query Retrieval | Generate multiple query variations with an LLM and retrieve for each, then merge results. |
| HyDE | Hypothetical Document Embeddings - Generate a hypothetical answer first, then use it as the retrieval query. |
| Contextual Retrieval | Anthropic's approach - Prepend chunk-specific context before embedding to reduce retrieval failures. |
- Cohere Rerank - Cross-encoder reranking API.
- ColBERT - Late interaction model for efficient and effective reranking.
- bge-reranker - Open-source cross-encoder reranker by BAAI.
- RankLLM - Using LLMs themselves as rerankers via listwise prompting.
- Query Rewriting - Use an LLM to reformulate the user query for better retrieval.
- Step-Back Prompting - Ask a more abstract question first to retrieve broader context.
- Query Decomposition - Break complex questions into sub-questions, retrieve for each, then synthesize.
| Framework | Description |
|---|---|
| RAGAS | Reference-free evaluation framework. Metrics: faithfulness, answer relevancy, context precision/recall. |
| TruLens | Evaluation and tracking for LLM apps, including RAG-specific metrics. |
| DeepEval | Unit testing framework for LLM outputs with RAG-aware metrics. |
| ARES | Automated RAG Evaluation System — uses LLM judges with statistical confidence. |
| Framework | Language | Description |
|---|---|---|
| LangChain | Python/JS | The most widely adopted framework for building RAG pipelines. Large ecosystem, many integrations. |
| LlamaIndex | Python | Data framework specifically designed for RAG. Strong focus on indexing and retrieval abstractions. |
| Haystack | Python | Production-ready NLP framework by deepset. Pipeline-based architecture. |
| RAGFlow | Python | Open-source RAG engine with deep document understanding and chunk visualization. |
| Verba | Python | Open-source RAG chatbot powered by Weaviate. Good for quick prototyping. |
| Cognita | Python | Open-source modular RAG framework for production use. |
| Database | Type | Notes |
|---|---|---|
| Chroma | Embedded | Lightweight, easy to start with. Good for prototyping and smaller projects. |
| Weaviate | Self-hosted / Cloud | Supports hybrid search natively. GraphQL API. |
| Qdrant | Self-hosted / Cloud | Written in Rust. Excellent filtering and payload support. |
| Milvus | Self-hosted / Cloud | Highly scalable. Used in many production deployments. |
| Pinecone | Cloud-only | Fully managed. Simple API. Popular in industry. |
| FAISS | Library | Meta's similarity search library. Not a database, but extremely fast for local use. |
| pgvector | PostgreSQL Extension | Add vector search to your existing PostgreSQL database. Great if you already use Postgres. |
| Model | Provider | Notes |
|---|---|---|
| text-embedding-3-small/large | OpenAI | Strong general-purpose embeddings. large variant has 3072 dimensions. |
| Cohere Embed v3 | Cohere | Supports multiple input types (search_document, search_query). |
| BGE (BAAI) | Open-source | Top-performing open-source embeddings. Available in multiple sizes. |
| E5-Mistral | Open-source | LLM-based embedding model. Strong performance on MTEB benchmark. |
| Nomic Embed | Open-source | Long-context (8192 tokens), fully open-source with open training data. |
| Jina Embeddings | Open-source | Multilingual, supports 8192 token context. Good for non-English corpora. |
Tip: Check the MTEB Leaderboard for up-to-date embedding model benchmarks.
- RAG From Scratch (LangChain) - Series of notebooks covering RAG concepts from basics to advanced patterns.
- Building RAG Applications with LlamaIndex - Official LlamaIndex documentation and conceptual guide.
- Pinecone RAG Learning Center - Well-written introduction to RAG with practical examples.
- Contextual Retrieval Guide - Practical improvements to standard RAG with contextual embeddings and BM25.
- Full Stack RAG App Tutorial (freeCodeCamp) - Video walkthrough of building a complete RAG application.
- But what is RAG? (3Blue1Brown-style explainer) - Visual, intuitive explanation of how RAG works.
- RAG is Dead? Long Live RAG! (Keynote) - Discussion on the future of RAG vs. long-context models.
- Building Production RAG (AI Engineer Summit) - Practical lessons from deploying RAG at scale.
- Advanced RAG Techniques (DeepLearning.AI) - Short course by Andrew Ng's platform.
| Dataset/Benchmark | Description |
|---|---|
| Natural Questions (NQ) | Google's open-domain QA dataset. Standard benchmark for retrieval systems. |
| HotpotQA | Multi-hop QA requiring reasoning over multiple documents. |
| MS MARCO | Large-scale passage retrieval and QA benchmark. |
| BEIR | Heterogeneous benchmark for zero-shot evaluation of retrieval models across diverse tasks. |
| RGB (Retrieval-Augmented Generation Benchmark) | Specifically designed to evaluate RAG systems on noise robustness, negative rejection, information integration, and counterfactual robustness. |
Contributions are welcome! This is a collaborative resource for students and researchers.
Please follow these guidelines:
- Fork this repository
- Add your resource in the appropriate section
- Follow the existing format:
**[Resource Name](link)** - Brief description. - Ensure all links are working and resources are relevant to RAG
- Submit a pull request
This work is dedicated to the public domain under CC0 1.0 Universal.
