Image Caption Generator 🖼️ ➡️ 📝

title	Image Caption Generator
emoji	🖼️
colorFrom	blue
colorTo	green
sdk	streamlit
app_file	app.py
pinned	false

Image Caption Generator 🖼️ ➡️ 📝

An advanced image captioning application that automatically generates highly accurate, descriptive captions for uploaded images using Deep Learning.

This project features a dual-architecture approach:

Salesforce BLIP (Bootstrapping Language-Image Pre-training): A state-of-the-art transformer model integrated via Hugging Face for maximum accuracy and robust production-level inference.
Custom CNN-LSTM Encoder-Decoder: A custom-built PyTorch architecture (ResNet-50 encoder and an LSTM decoder) trained from scratch on the Flickr8k dataset.

It provides a clean, interactive Streamlit web interface for generating captions instantly.

🌟 Features

High-Accuracy Inference: Uses the robust Salesforce BLIP model for state-of-the-art zero-shot image captioning.
Custom Model Training Pipeline: Complete end-to-end pipeline for training a CNN-LSTM model on custom datasets.
Interactive UI: Built with Streamlit for a fast, responsive, and user-friendly experience.
Hugging Face Spaces Ready: Pre-configured metadata for seamless deployment.

🛠️ Technology Stack

Deep Learning Framework: PyTorch
Models: Salesforce BLIP, ResNet-50, LSTM
Libraries: Transformers (Hugging Face), Torchvision, NLTK, PIL
Frontend: Streamlit

📂 Project Structure

├── app.py                # Streamlit web application (uses BLIP for frontend)
├── blip_inference.py     # BLIP model inference script
├── model.py              # Custom CNN-LSTM architecture (ResNet50 + LSTM)
├── train.py              # Training script for the custom model
├── dataset.py            # Custom PyTorch Dataset loader for Flickr8k
├── build_vocab.py        # Vocabulary builder for custom training
├── inference.py          # Command-line inference for the custom CNN-LSTM model
├── requirements.txt      # Python dependencies
└── README.md             # Project documentation

⚙️ Installation & Setup

Clone the repository:

git clone <repository_url>
cd image-caption-generator

Create a virtual environment (Optional but recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the dependencies:
```
pip install -r requirements.txt
```

🚀 Usage

1. Running the Web App (BLIP Model)

The easiest way to use the application is through the interactive Streamlit web interface. It will automatically download the BLIP Large model on the very first run.

streamlit run app.py

Simply upload an image through the UI to generate an accurate caption!

2. Custom Model Training (CNN-LSTM)

If you wish to train the custom Encoder-Decoder model from scratch:

Download the Flickr8k dataset (images and captions).
Configure the dataset paths appropriately (see dataset.py and train.py).
Run the training script:
```
python train.py
```
This process will generate a vocabulary dictionary (vocab.pkl) and save model weights (caption_model.pth).

3. Custom Model Inference

Once the custom CNN-LSTM model is fully trained, you can use the command-line inference script to generate captions:

python inference.py --image path/to/image.jpg

📝 Troubleshooting

macOS SSL Verification: If you encounter SSL certificate verification errors on macOS when downloading pretrained ResNet weights, the code includes a built-in patch (ssl._create_unverified_context in model.py) to resolve this automatically.

📄 License

This project is open-source and available under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Caption Generator 🖼️ ➡️ 📝

🌟 Features

🛠️ Technology Stack

📂 Project Structure

⚙️ Installation & Setup

🚀 Usage

1. Running the Web App (BLIP Model)

2. Custom Model Training (CNN-LSTM)

3. Custom Model Inference

📝 Troubleshooting

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
blip_inference.py		blip_inference.py
build_vocab.py		build_vocab.py
dataset.py		dataset.py
inference.py		inference.py
model.py		model.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

Image Caption Generator 🖼️ ➡️ 📝

🌟 Features

🛠️ Technology Stack

📂 Project Structure

⚙️ Installation & Setup

🚀 Usage

1. Running the Web App (BLIP Model)

2. Custom Model Training (CNN-LSTM)

3. Custom Model Inference

📝 Troubleshooting

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages