MITRA - Model in the Rear Approach to SafeGuarding LLMs

Introduction
Features
Demo
Technologies Used
- Frontend
- Backend
- Others
Installation
- Clone the Repository
- Navigate to the Project Directory
- Create a Virtual Environment
- Activate the Virtual Environment
- Install Dependencies
- Download Qwen2.5:0.5 from Ollama
Usage
- Activate the Virtual Environment
- Run the Flask Application
Project Structure

Introduction

MITRA is a novel LLM Guardrail method which uses classification LLMs to protect the main LLM from jailbreaks and other harmful prompts.

Publication

A detailed publication on MITRA - Model in the Rear Approach to Safeguarding LLMs is incoming soon. It will cover the methodology, architecture, and results in-depth, providing insights into the novel techniques used to safeguard large language models (LLMs). Stay tuned!

Features

Dynamic Chat Interface: Engage in real-time conversations with MITRA through a user-friendly chat interface.
Safety Checks: Incorporates multiple safety mechanisms, including toxicity detection and jailbreak prevention, to ensure responsible AI interactions.
Auto-Resizing Textarea: Enhances user experience by automatically adjusting the input field based on the message length.
Responsive Design: Optimized for various devices, ensuring seamless usability on desktops, tablets, and mobile phones.
Confetti Feedback: Celebratory visual effects upon successful AI responses to enhance user engagement.
Example Prompts Animation: Rotating example prompts guide users on how to effectively interact with MITRA.
Rate Limiting: Protects the backend from abuse by limiting the number of requests a user can make within specific timeframes.

Demo

Screen.Recording.2025-01-14.at.3.26.34.PM.mov

Technologies Used

Models Used

1. Toxicity Detection

Model: s-nlp/roberta_toxicity_classifier
Details: This is a RoBERTa-based model fine-tuned for toxicity detection. It identifies harmful or offensive content in text prompts with high accuracy.
Reference: Hugging Face Model Card

2. Jailbreak Detection

Model: madhurjindal/Jailbreak-Detector-Large
Details: A large language model trained to detect jailbreak attempts or adversarial prompts aimed at bypassing safety protocols.
Reference: Hugging Face Model Card

3. Embeddings

Model: all-MiniLM-L6-v2
Details: A lightweight SentenceTransformer model optimized for generating semantic embeddings. It balances speed and accuracy, making it suitable for real-time applications.
Reference: Sentence Transformers Documentation

4. Language Model

Model: Qwen2.5:0.5b
Details: A fast and efficient large language model with 0.5 billion parameters, integrated via Langchain Ollama. This model handles safe and contextual AI responses.
Reference: Ollama Documentation

Backend

Python 3.8+: Programming language for backend logic.
Flask: Web framework for handling HTTP requests and serving templates.
Hugging Face Transformers:
- Toxicity Model: s-nlp/roberta_toxicity_classifier (RoBERTa-based classifier fine-tuned for toxicity detection).
- Jailbreak Detection Model: madhurjindal/Jailbreak-Detector-Large (Large model trained to identify jailbreak attempts in prompts).
SentenceTransformers:
- Embeddings Model: all-MiniLM-L6-v2 (Lightweight model optimized for generating semantic embeddings with fast inference).
Langchain Ollama:
- LLM: Qwen2.5:0.5b (Large language model with 0.5 billion parameters, optimized for speed and accuracy in inference).
PyTorch: Deep learning library powering Hugging Face and SentenceTransformers models.
Asyncio: Asynchronous programming for handling multiple concurrent safety checks.
Flask-Limiter: Rate limiting to prevent abuse by restricting the number of API requests (if implemented).

Frontend

HTML5, CSS3, JavaScript (ES6): For the dynamic chat interface.
Font Awesome: For icons used in the frontend.
Google Fonts: For enhanced typography.
Canvas-Confetti: For celebratory visual effects upon successful AI responses.

Others

Git & GitHub: For version control and project collaboration.
Canvas-Confetti: For confetti animations and engaging user feedback.

Installation

Follow these steps to set up the project locally on your machine.

Prerequisites

Python 3.8+ installed on your machine. You can download it from here.
Git installed on your machine. Download from here.
Virtual Environment (Recommended): It's good practice to use a virtual environment to manage dependencies.

Steps

Clone the Repository

git clone https://github.com/AbhayBhandarkar/MITRA.git

Navigate to the Project Directory
```
cd MITRA
```
Create a Virtual Environment
```
python3 -m venv venv
```
Activate the Virtual Environment
- On macOS and Linux:
```
source venv/bin/activate
```
- On Windows:
```
venv\Scripts\activate
```
Install Dependencies
```
pip install -r requirements.txt
```
Download Qwen2.5:0.5 from Ollama
```
ollama serve
ollama pull qwen2.5:0.5b
```

Usage

Activate the Virtual Environment

Ensure that your virtual environment is activated.

source venv/bin/activate  # On macOS and Linux
venv\Scripts\activate     # On Windows

Run the Flask Application

python app.py

The application will start running on http://0.0.0.0:5000/ by default.

Project Structure

MITRA/
├── app.py                 # Main Flask application
├── pipeline.py            # Core pipeline for safety checks and LLM interactions
├── requirements.txt       # Python dependencies
├── static/                # Static files
│   ├── script.js          # Frontend JavaScript logic
│   └── styles.css         # Styling for the web interface
├── templates/             # HTML templates for the Flask app
│   └── index.html         # Main page
├── logs/                  # Log files
│   └── app.log            # Backend log output
├── .gitignore             # Files and directories to ignore in Git
└── README.md              # Project documentation

License

This project is licensed under the Apache License 2.0.
You can view the full license details in the LICENSE file or at the following link: Apache License 2.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MITRA - Model in the Rear Approach to SafeGuarding LLMs

Table of Contents

Introduction

Publication

Features

Demo

Technologies Used

Models Used

1. Toxicity Detection

2. Jailbreak Detection

3. Embeddings

4. Language Model

Backend

Frontend

Others

Installation

Prerequisites

Steps

Usage

Activate the Virtual Environment

Run the Flask Application

The application will start running on http://0.0.0.0:5000/ by default.

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
static		static
templates		templates
.gitignore		.gitignore
JBB_Benchmark.png		JBB_Benchmark.png
JBB_Benchmark.py		JBB_Benchmark.py
LICENSE		LICENSE
README.md		README.md
app.py		app.py
benchmark_results.csv		benchmark_results.csv
pipeline.py		pipeline.py
requirements.txt		requirements.txt
zero_shot_filter.py		zero_shot_filter.py

Folders and files

Latest commit

History

Repository files navigation

MITRA - Model in the Rear Approach to SafeGuarding LLMs

Table of Contents

Introduction

Publication

Features

Demo

Technologies Used

Models Used

1. Toxicity Detection

2. Jailbreak Detection

3. Embeddings

4. Language Model

Backend

Frontend

Others

Installation

Prerequisites

Steps

Usage

Activate the Virtual Environment

Run the Flask Application

The application will start running on http://0.0.0.0:5000/ by default.

Project Structure

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages