This project provides a practical set of tools for generating captions for images using the BLIP model. You can caption images by uploading a single file, processing all images in a local folder, or scraping images from a webpage. The interface is simple and user-friendly, making it easy for anyone to use.
- Features
- Screenshots
- Installation
- Usage
- Example Output
- Notes
- Contributing & License
- Acknowledgements
- Single Image Upload: Instantly get a caption for any image you upload via the web interface.
- Bulk Local Captioning: Automatically process all images in a folder and save their captions to a file.
- Webpage Scraping: Find and caption all images from any webpage you provide.
- Easy Web Interface: All tools use Gradio-based web UIs for convenience.
- Docker Support: Run the whole project easily in a container.
Single Image Upload:
Bulk Local Captioning:
Webpage Scraping:
-
Clone the repository:
git clone https://github.com/eray-yuztyurk/python-ai-image-captioning.git cd python-ai-image-captioning -
Make sure Python 3.10+ is installed.
-
Create and activate a virtual environment:
python3.10 -m venv venv source venv/bin/activate -
Install dependencies:
pip install --upgrade pip pip install -r requirements.txt
python3.10 main.pyEach interface will open on its own port (7860, 7861, 7862). If your browser does not open automatically, visit these addresses manually.
-
Single image upload:
python3.10 uploaded_image_captioner.py
(Port: 7860) -
Local folder images:
python3.10 local_img_captioner_automated.py
(Port: 7862) -
Webpage scraping:
python3.10 url_img_captioner_automated.py
(Port: 7861)
docker build -t ai-image-captioning .
docker run -p 7860:7860 -p 7861:7861 -p 7862:7862 ai-image-captioningcat.jpg: a small orange cat sitting on a windowsill
https://example.com/image1.png: a group of people standing in front of a building
- Make sure the
outputs/directory exists, or specify a valid output path. - The BLIP model will be downloaded automatically on first run.
- For best results, use clear and sufficiently large images (at least 100x100 pixels).
This project is released under the MIT License. If you want to contribute, feel free to open an issue or pull request.
Feel free to open issues or contribute improvements!