README - Master Thesis Code & Evaluation Data

Project Description

This repository contains the source code, data, and evaluation results for the thesis "Retrieval Optimization for RAG-based Chatbots". The project implements and evaluates multiple retrieval optimization approaches.

Directory Structure

/chroma_db/
The vector databases used by the approaches are not included directly in this repository due to file size limitations.
Instead, the full Chroma vector database used in this project is available for download via Zenodo:
[https://doi.org/10.5281/zenodo.15666607]
- Approach 01 and Approach 02 share the same Chroma database.
- Approach 03 has a separate database due to the parent-child chunk linking.
/data/
Contains the PDF-based directive documents (Weisungen) used as the knowledge base for each approach.
/eval_dataset/
Contains the manually curated evaluation datasets in both JSON and CSV formats.
/local_datastore/
Contains the chunk storage for non-vectorized retrieval:
- /sparse_datastore/ stores the BM25 chunks.
- /parent_store/ stores the parent-child chunks for Approach 03.
/test/
Contains the evaluation scripts used to execute and compare the different retrieval approaches.
This folder also includes:
- Log files that contain the overall pass rates for each approach.
- The /result/ subfolder, which contains all evaluation results in both JSON and CSV format, including a combined comparison table summarizing all approaches.
/rag_demo.py/, /rag_demo_approach_02.py/, /rag_demo_approach_03.py/
The implementation folders for each of the three retrieval approaches. These folders are located directly in the root directory.
config.py
Configuration file located in the root directory. It manages folder paths and parameters for all approaches and evaluation scripts.

Environment

Python 3.11.6
See requirements.txt for all required dependencies.

Credentials

All credentials, API keys, and sensitive configuration values have been removed prior to submission.

Notes

This code is provided for documentation purposes only.
Execution is not required for thesis evaluation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README - Master Thesis Code & Evaluation Data

Project Description

Directory Structure

Environment

Credentials

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
eval_dataset		eval_dataset
local_datastore		local_datastore
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
rag_demo.py		rag_demo.py
rag_demo_approach_02.py		rag_demo_approach_02.py
rag_demo_approach_03.py		rag_demo_approach_03.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

README - Master Thesis Code & Evaluation Data

Project Description

Directory Structure

Environment

Credentials

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages