🗨️Medical Q&A Chatbot

A simple and user-friendly medical question-answering chatbot built using the MedQuAD dataset from Kaggle. The chatbot uses NLP and similarity search to provide medical information based on real datasets from trusted health sources.

This project is built with Streamlit and deployed on Streamlit Community Cloud.

Live app: Medical Q&A Chatbot

Features

Uses MedQuAD dataset for high-quality medical Q&A
Clean, fast, interactive Streamlit interface
Uses sentence embeddings + FAISS / ML models for similarity search
Provides accurate and relevant medical answers
Mobile-friendly and easy to use
Fully deployed online on Streamlit Cloud

Tech Stack

Python, Streamlit for web UI.
SentenceTransformer / embedding model for semantic search (or FAISS / sklearn similarity).
TensorFlow / Keras model for intent or classification (if used).
NLTK for text preprocessing and lemmatization.
Google Generative AI / Gemini for answer summarization.

Project Structure

app.py – Main Streamlit app that loads the FAISS index, TF‑IDF vectorizer, label encoder, trained model and handles user chat.
training.ipynb – Notebook used to preprocess data and train the chatbot model.
create_index.py – Script to build the FAISS index and related artifacts from the QA dataset.
data/ – Input datasets (e.g., CSV/JSON with questions and answers).
chatbot_model.h5 – Trained Keras/TensorFlow model for intent/answer prediction.
tfidf_vectorizer.pkl – Saved TF‑IDF vectorizer for converting text to vectors.
label_encoder.pkl – Encodes and decodes class labels for answers/intents.
qa_answers.pkl – Serialized mapping from labels/indices to final text answers.
faiss_index.bin – Binary FAISS index file for fast semantic search.
requirements.txt – Python dependencies for the project.
.devcontainer/ – Optional VS Code Dev Container configuration.

Installation

Clone the repository: git clone https://github.com/kashish334/projects.git
Install dependencies: pip install -r requirements.txt
Run the Streamlit app: streamlit run app.py

Dataset

This project uses the MedQuAD (Medical Question Answering Dataset) from Kaggle. It contains questions and answers collected from official medical sources like:

NIH
National Cancer Institute
National Heart, Lung, and Blood Institute

How It Works

User question is preprocessed and converted into a TF‑IDF vector.
FAISS is used to find similar questions in the dataset; their answers are retrieved.
The trained neural network / classifier refines the prediction or selects the best answer.
The final answer is displayed in the Streamlit UI, optionally with similarity score or related questions.

Limitations

Answers come from a fixed dataset and may not cover all medical conditions.
This chatbot is for educational purposes only and must not be used as a substitute for professional medical advice.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.devcontainer		.devcontainer
data		data
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
app.py		app.py
create_index.py		create_index.py
download_assets.py		download_assets.py
requirements.txt		requirements.txt
training.ipynb		training.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🗨️Medical Q&A Chatbot

Features

Tech Stack

Project Structure

Installation

Dataset

How It Works

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🗨️Medical Q&A Chatbot

Features

Tech Stack

Project Structure

Installation

Dataset

How It Works

Limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages