This project creates a reliable framework using CLIP to detect
different types of Personally Identifiable Information (PII) in images in a zero-shot manner,
allowing sensitive data to be recognized and highlighted without extra training(Zero Shot). This project also compares the results of CLIP, Siglip, and CLIPSeg on the same test cases.
The code is designed to be run on Google Colab.
CLIP can be used to detect PII in scanned documents (IDs, invoices), medical records, and financial screenshots by identifying sensitive visual patterns like names, numbers, or signatures. It also helps flag personal data in social media images (e.g., boarding passes, IDs) and prevents workplace data leaks from shared dashboards or internal tools. Additionally, it can support anonymization in street or surveillance images by detecting faces and license plates
To get started with this project, follow the instructions below.
- Open your web browser and go to Google Colab.
- Click on
File>Open notebook. - Select the
GitHubtab and enter the repository URL:https://mygit.th-deg.de/ai-project-summer-24/clip2pii2.git. - Choose the notebook you want to run (e.g.,
CLIP_&_CLIPSeg.ipynb) and open it. - Follow the instructions within the notebook to run the code. #
- Preferable change the runtime of the notebook to T4.
The required dependencies for this project are listed in the notebook. To ensure you have all the necessary packages, run the code lines in the same order of the notebook cells.
https://github.com/merveenoyan/siglip/blob/main/clip_siglip.py