Grad-Eclip is a straightforward and easy-to-implement method to generate visual explanation heat maps for transformer-based CLIP. It can be applied on both image and text branch. The framework and results are shown here:
- framework
- visualization comparison of different XAI methods on explaining image encoder with provided text prompts.
- visualization comparison of different XAI methods on explaining both image encoder and text encoder with image-text pair.
If you use the code in your research, please cite:
@inproceedings{chenyang_gradeclip,
title={Gradient-based Visual Explanation for CLIP},
author={Zhao, Chenyang and Wang, Kun and Zeng, Xingyu and Zhao, Rui and Chan, B. Antoni},
booktitle={International Conference on Machine Learning (ICML)},
month = {July},
year = {2024}
}
If you have any questions, please do not hesitate to contact Chenyang ZHAO (zhaocy2333@gmail.com).


