CLIP-SegFusion: An Attention-Guided Feature Fusion Framework for Multi-Level Detection on AI-Generated Artworks
This repository provides the PyTorch implementation of the paper:
CLIP-SegFusion: An Attention-Guided Feature Fusion Framework for Multi-Level Detection on AI-Generated Artworks
(Implementation of a multi-level detection framework for distinguishing AI-generated artworks)
Ensure that you have the following environment:
| Package | Version |
|---|---|
| CUDA | 11.8 |
| Python | 3.8.20 |
| PyTorch | 2.4.1 |
| Torchvision | 0.19.1 |
| NumPy | 1.24.4 |
| pandas | 2.0.3 |
| tqdm | 4.67.1 |
| Pillow | 10.4.0 |
| scikit-learn | 1.3.2 |
| opencv-python | 4.11.0.86 |
| albumentations | 1.4.18 |
| imgaug | 0.4.0 |
| transformers | 4.46.3 |
| open-clip-torch | 2.30.0 |
| ftfy | 6.2.3 |
| regex | 2024.11.6 |
| packaging | 24.2 |
Run the following command to train the model:
python train.py \
--dataset_path /path/to/dataset \
--csv_path /path/to/metadata.csv \
--output_path /path/to/save_dirRun the following command to evaluate a trained model:
python evaluate.py \
--dataset_path /path/to/dataset \
--csv_path /path/to/metadata.csv \
--model_weight_path /path/to/model_weights.pth \
--results_file /path/to/save_results.csv| Argument | Description |
|---|---|
--dataset_path |
Root directory of dataset images. |
--csv_path |
Path to the metadata CSV file containing image paths and labels. |
--output_path |
Directory to save checkpoints. |
--model_weight_path |
Path to a trained .pth model file (required only for evaluation). |
--results_file |
Path to save the evaluation metrics as a CSV file (only for evaluation). |
| Model | Accuracy (%) | Precision (%) | Recall (%) | F1 (%) | AUC (%) |
|---|---|---|---|---|---|
| AutoGAN | 83.15 | 83.15 | 83.15 | 83.15 | 90.63 |
| DIRE | 91.35 | 91.35 | 91.35 | 91.35 | 97.32 |
| De-Fake | 89.90 | 89.92 | 89.90 | 89.90 | 96.18 |
| ZeroFake | 50.30 | 50.43 | 50.30 | 46.17 | 51.49 |
| LaRE | 95.50 | 95.50 | 95.50 | 95.50 | 99.08 |
| CLIPping (PT) | 94.05 | 94.43 | 94.05 | 94.04 | 98.84 |
| CLIPping (LP) | 92.50 | 92.98 | 92.50 | 92.48 | 98.07 |
| LOTA | 67.35 | 67.90 | 67.35 | 67.10 | 73.05 |
| SAFE | 77.40 | 77.56 | 77.40 | 77.37 | 85.83 |
| CLIP-SegFusion (Ours) | 94.80 | 94.80 | 94.80 | 94.80 | 98.77 |
| Model | Accuracy (%) | Precision (%) | Recall (%) | F1 (%) | AUC (%) |
|---|---|---|---|---|---|
| AutoGAN | 73.98 | 74.58 | 73.98 | 73.82 | 82.78 |
| DIRE | 85.09 | 85.09 | 85.09 | 85.09 | 93.03 |
| De-Fake | 85.53 | 85.56 | 85.53 | 85.53 | 92.58 |
| ZeroFake | 65.83 | 67.33 | 65.83 | 65.08 | 74.70 |
| LaRE | 90.42 | 90.47 | 90.42 | 90.42 | 97.00 |
| CLIPping (PT) | 87.01 | 87.46 | 87.02 | 86.97 | 95.11 |
| CLIPping (LP) | 86.42 | 87.94 | 86.42 | 86.28 | 95.78 |
| LOTA | 52.89 | 53.93 | 52.88 | 49.49 | 55.88 |
| SAFE | 65.33 | 65.59 | 65.34 | 65.19 | 72.07 |
| CLIP-SegFusion (Ours) | 90.47 | 90.55 | 90.47 | 90.46 | 96.88 |
| Model | Latent Diffusion | Midjourney | DALLE | Imagen | Janus | StyleGAN2 |
|---|---|---|---|---|---|---|
| AutoGAN | 60.69 / 70.38 | 44.94 / 52.52 | 41.92 / 49.54 | 43.65 / 52.91 | 72.18 / 80.92 | 44.86 / 56.69 |
| DIRE | 51.03 / 57.51 | 66.02 / 84.53 | 61.08 / 81.12 | 37.84 / 60.11 | 68.08 / 85.65 | 33.78 / 44.64 |
| De-Fake | 53.74 / 54.69 | 61.95 / 80.40 | 53.52 / 70.39 | 39.81 / 56.50 | 63.81 / 82.10 | 72.10 / 84.78 |
| ZeroFake | 45.46 / 46.62 | 37.05 / 34.89 | 35.66 / 30.86 | 44.31 / 34.29 | 68.77 / 82.30 | 31.24 / 20.39 |
| LaRE | 68.43 / 78.37 | 82.02 / 90.17 | 71.29 / 79.68 | 92.40 / 97.76 | 77.20 / 84.98 | 55.00 / 70.45 |
| CLIPping (PT) | 69.12 / 77.62 | 52.53 / 79.01 | 43.45 / 71.85 | 33.33 / 39.18 | 74.95 / 94.67 | 64.01 / 91.57 |
| CLIPping (LP) | 76.42 / 84.37 | 82.45 / 94.38 | 71.21 / 89.53 | 61.29 / 87.45 | 84.02 / 96.15 | 91.74 / 98.11 |
| LOTA | 47.59 / 53.43 | 62.30 / 67.57 | 59.57 / 64.56 | 58.61 / 64.58 | 68.70 / 73.92 | 72.26 / 76.25 |
| SAFE | 63.66 / 68.66 | 68.17 / 78.20 | 64.20 / 73.50 | 83.44 / 90.57 | 81.20 / 88.56 | 58.02 / 67.67 |
| CLIP-SegFusion (Ours) | 77.22 / 89.80 | 87.77 / 95.03 | 82.96 / 92.48 | 68.61 / 85.99 | 91.50 / 97.27 | 93.93 / 98.51 |