A RAG pipeline that eliminates AI hallucinations in high-stakes healthcare environments.
Grounded entirely in verified clinical transcriptions — the model cannot answer what isn't in the knowledge base.
| Metric | Detail |
|---|---|
| 📄 Clinical Records Loaded | 1,000 transcriptions (MTSamples via Kaggle) |
| 🔍 Retrieval | Top-3 semantic similarity matches per query (FAISS) |
| ✂️ Chunk Config | 1,000 tokens, 100-token overlap |
| 🧠 Embedding Model | all-MiniLM-L6-v2 (HuggingFace) |
| 🤖 LLM | gemini-2.5-flash-lite (Google GenAI SDK) |
| 🚫 Hallucination Guard | Prompt-enforced abstention — "I don't know" if context is absent |
Standard LLMs generate confident-sounding medical answers from training weights — even when they're wrong. In healthcare, a hallucinated drug interaction or fabricated patient detail isn't a UX issue — it's a safety risk.
This system enforces a "Search-then-Summarize" architecture:
- Every query is embedded and matched against a local FAISS vector index of clinical records
- Top-3 matching transcriptions are injected into the prompt as the only permitted context
- The model is explicitly instructed: if the answer isn't in the context, say so — no inference from training weights
User Query
│
▼
HuggingFace Embeddings ← all-MiniLM-L6-v2
│
▼
FAISS Similarity Search → Top-3 Matching Clinical Records
│
▼
Prompt Assembly:
"Only use the context provided.
If the answer isn't there, say you don't know."
│
▼
gemini-2.5-flash-lite → Grounded Response (or Abstention)
def clinical_assistant(query):
# Retrieve top-3 semantically similar clinical records
search_results = vector_db.similarity_search(query, k=3)
context = ""
for res in search_results:
context += f"\n---\n{res.page_content}\n"
prompt = f"""
You are an AI Clinical Assistant. Using the provided medical transcriptions, answer the user query.
Rules:
1. Only use the context provided.
2. If the answer isn't there, say you don't know.
CONTEXT: {context}
QUERY: {query}
"""
response = client.models.generate_content(
model="gemini-2.5-flash-lite", contents=prompt
)
return response.textThe guardrail is enforced at the prompt level — the model is given no fallback permission to use its training data. In medical AI, a confident wrong answer is worse than no answer.
| Layer | Technology |
|---|---|
| LLM | Google Gemini 2.5 Flash Lite (via google-genai SDK) |
| Vector Database | FAISS (local — sensitive records never leave the machine) |
| Embeddings | HuggingFace all-MiniLM-L6-v2 |
| Data Ingestion | LangChain CSVLoader |
| Chunking | RecursiveCharacterTextSplitter (chunk=1000, overlap=100) |
| Dataset | MTSamples — Kaggle |
Query: "What are the details of the 'Allergic Rhinitis' consultation?"
The system identified the exact patient file from 1,000 records — surfacing precise vitals (BP: 124/78), active medications (Ortho Tri-Cyclen, Allegra), and full symptom history. Zero fabricated details.
Query: "How do I treat a broken arm?" Response: "I do not have enough verified information to answer this."
Fracture treatment protocols were not in the loaded data slice. The prompt-level guardrail blocked the model from falling back on training knowledge — exactly the intended behavior.
- Why FAISS over a hosted vector DB? Fully local — zero latency overhead, no API costs, and sensitive medical records never leave the machine.
- Why
all-MiniLM-L6-v2? Strong semantic similarity on short clinical chunks with minimal memory footprint — purpose-fit for this retrieval task. - Why
k=3retrieval? Balances context richness with prompt size. Too few records → missed context. Too many → diluted relevance and higher token costs. - Why prompt-level abstention? Hard prompt rules are deterministic and interpretable — no threshold calibration needed per domain.
# 1. Clone
git clone https://github.com/Rahilshah01/medical-rag-intelligence-system.git
cd medical-rag-intelligence-system
# 2. Install
pip install google-genai langchain-community langchain-text-splitters langchain-huggingface faiss-cpu pandas python-dotenv
# 3. Add data — download mtsamples.csv from Kaggle → place in /data/
# 4. Set API key
echo "GEMINI_API_KEY=your_key_here" > .env
# 5. Run
python rag_assistant.pymedical-rag-intelligence-system/
├── data/
│ └── mtsamples.csv # Clinical records (download from Kaggle)
├── rag_assistant.py # Main pipeline
├── .env.example
├── requirements.txt
└── README.md
Built by Rahil Shah · MS Data Science @ Stevens Institute of Technology