🏥 Medical RAG Intelligence System

A RAG pipeline that eliminates AI hallucinations in high-stakes healthcare environments.
Grounded entirely in verified clinical transcriptions — the model cannot answer what isn't in the knowledge base.

⚡ Results at a Glance

Metric	Detail
📄 Clinical Records Loaded	1,000 transcriptions (MTSamples via Kaggle)
🔍 Retrieval	Top-3 semantic similarity matches per query (FAISS)
✂️ Chunk Config	1,000 tokens, 100-token overlap
🧠 Embedding Model	`all-MiniLM-L6-v2` (HuggingFace)
🤖 LLM	`gemini-2.5-flash-lite` (Google GenAI SDK)
🚫 Hallucination Guard	Prompt-enforced abstention — "I don't know" if context is absent

🧠 The Problem This Solves

Standard LLMs generate confident-sounding medical answers from training weights — even when they're wrong. In healthcare, a hallucinated drug interaction or fabricated patient detail isn't a UX issue — it's a safety risk.

This system enforces a "Search-then-Summarize" architecture:

Every query is embedded and matched against a local FAISS vector index of clinical records
Top-3 matching transcriptions are injected into the prompt as the only permitted context
The model is explicitly instructed: if the answer isn't in the context, say so — no inference from training weights

🏗️ System Architecture

User Query
    │
    ▼
HuggingFace Embeddings  ←  all-MiniLM-L6-v2
    │
    ▼
FAISS Similarity Search  →  Top-3 Matching Clinical Records
    │
    ▼
Prompt Assembly:
  "Only use the context provided.
   If the answer isn't there, say you don't know."
    │
    ▼
gemini-2.5-flash-lite  →  Grounded Response (or Abstention)

💻 Core Implementation

def clinical_assistant(query):
    # Retrieve top-3 semantically similar clinical records
    search_results = vector_db.similarity_search(query, k=3)

    context = ""
    for res in search_results:
        context += f"\n---\n{res.page_content}\n"

    prompt = f"""
    You are an AI Clinical Assistant. Using the provided medical transcriptions, answer the user query.
    Rules:
    1. Only use the context provided.
    2. If the answer isn't there, say you don't know.

    CONTEXT: {context}
    QUERY: {query}
    """

    response = client.models.generate_content(
        model="gemini-2.5-flash-lite", contents=prompt
    )
    return response.text

The guardrail is enforced at the prompt level — the model is given no fallback permission to use its training data. In medical AI, a confident wrong answer is worse than no answer.

🛠️ Tech Stack

Layer	Technology
LLM	Google Gemini 2.5 Flash Lite (via `google-genai` SDK)
Vector Database	FAISS (local — sensitive records never leave the machine)
Embeddings	HuggingFace `all-MiniLM-L6-v2`
Data Ingestion	LangChain `CSVLoader`
Chunking	`RecursiveCharacterTextSplitter` (chunk=1000, overlap=100)
Dataset	MTSamples — Kaggle

📊 Live Inference Results

✅ Test 1: Verified Knowledge Retrieval

Query: "What are the details of the 'Allergic Rhinitis' consultation?"

The system identified the exact patient file from 1,000 records — surfacing precise vitals (BP: 124/78), active medications (Ortho Tri-Cyclen, Allegra), and full symptom history. Zero fabricated details.

🚫 Test 2: Hallucination Prevention in Action

Query: "How do I treat a broken arm?" Response: "I do not have enough verified information to answer this."

Fracture treatment protocols were not in the loaded data slice. The prompt-level guardrail blocked the model from falling back on training knowledge — exactly the intended behavior.

🔑 Key Engineering Decisions

Why FAISS over a hosted vector DB? Fully local — zero latency overhead, no API costs, and sensitive medical records never leave the machine.
Why all-MiniLM-L6-v2? Strong semantic similarity on short clinical chunks with minimal memory footprint — purpose-fit for this retrieval task.
Why k=3 retrieval? Balances context richness with prompt size. Too few records → missed context. Too many → diluted relevance and higher token costs.
Why prompt-level abstention? Hard prompt rules are deterministic and interpretable — no threshold calibration needed per domain.

🚀 Quick Start

# 1. Clone
git clone https://github.com/Rahilshah01/medical-rag-intelligence-system.git
cd medical-rag-intelligence-system

# 2. Install
pip install google-genai langchain-community langchain-text-splitters langchain-huggingface faiss-cpu pandas python-dotenv

# 3. Add data — download mtsamples.csv from Kaggle → place in /data/

# 4. Set API key
echo "GEMINI_API_KEY=your_key_here" > .env

# 5. Run
python rag_assistant.py

📁 Repository Structure

medical-rag-intelligence-system/
├── data/
│   └── mtsamples.csv        # Clinical records (download from Kaggle)
├── rag_assistant.py         # Main pipeline
├── .env.example
├── requirements.txt
└── README.md

Built by Rahil Shah · MS Data Science @ Stevens Institute of Technology

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
README.md		README.md
medical_rag.ipynb		medical_rag.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏥 Medical RAG Intelligence System

⚡ Results at a Glance

🧠 The Problem This Solves

🏗️ System Architecture

💻 Core Implementation

🛠️ Tech Stack

📊 Live Inference Results

✅ Test 1: Verified Knowledge Retrieval

🚫 Test 2: Hallucination Prevention in Action

🔑 Key Engineering Decisions

🚀 Quick Start

📁 Repository Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🏥 Medical RAG Intelligence System

⚡ Results at a Glance

🧠 The Problem This Solves

🏗️ System Architecture

💻 Core Implementation

🛠️ Tech Stack

📊 Live Inference Results

✅ Test 1: Verified Knowledge Retrieval

🚫 Test 2: Hallucination Prevention in Action

🔑 Key Engineering Decisions

🚀 Quick Start

📁 Repository Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages