CLIP-SegFusion: An Attention-Guided Feature Fusion Framework for Multi-Level Detection on AI-Generated Artworks

This repository provides the PyTorch implementation of the paper:

CLIP-SegFusion: An Attention-Guided Feature Fusion Framework for Multi-Level Detection on AI-Generated Artworks
(Implementation of a multi-level detection framework for distinguishing AI-generated artworks)

Architecture

Dependencies

Ensure that you have the following environment:

Package	Version
CUDA	11.8
Python	3.8.20
PyTorch	2.4.1
Torchvision	0.19.1
NumPy	1.24.4
pandas	2.0.3
tqdm	4.67.1
Pillow	10.4.0
scikit-learn	1.3.2
opencv-python	4.11.0.86
albumentations	1.4.18
imgaug	0.4.0
transformers	4.46.3
open-clip-torch	2.30.0
ftfy	6.2.3
regex	2024.11.6
packaging	24.2

Usage

Training

Run the following command to train the model:

python train.py \
  --dataset_path /path/to/dataset \
  --csv_path /path/to/metadata.csv \
  --output_path /path/to/save_dir

Evaluation

Run the following command to evaluate a trained model:

python evaluate.py \
  --dataset_path /path/to/dataset \
  --csv_path /path/to/metadata.csv \
  --model_weight_path /path/to/model_weights.pth \
  --results_file /path/to/save_results.csv

Arguments

Argument	Description
`--dataset_path`	Root directory of dataset images.
`--csv_path`	Path to the metadata CSV file containing image paths and labels.
`--output_path`	Directory to save checkpoints.
`--model_weight_path`	Path to a trained `.pth` model file (required only for evaluation).
`--results_file`	Path to save the evaluation metrics as a CSV file (only for evaluation).

Evaluation Results (%) on ID Dataset (Stable Diffusion)

LA_Inpainting

Model	Accuracy (%)	Precision (%)	Recall (%)	F1 (%)	AUC (%)
AutoGAN	83.15	83.15	83.15	83.15	90.63
DIRE	91.35	91.35	91.35	91.35	97.32
De-Fake	89.90	89.92	89.90	89.90	96.18
ZeroFake	50.30	50.43	50.30	46.17	51.49
LaRE	95.50	95.50	95.50	95.50	99.08
CLIPping (PT)	94.05	94.43	94.05	94.04	98.84
CLIPping (LP)	92.50	92.98	92.50	92.48	98.07
LOTA	67.35	67.90	67.35	67.10	73.05
SAFE	77.40	77.56	77.40	77.37	85.83
CLIP-SegFusion (Ours)	94.80	94.80	94.80	94.80	98.77

Inpainting

Model	Accuracy (%)	Precision (%)	Recall (%)	F1 (%)	AUC (%)
AutoGAN	73.98	74.58	73.98	73.82	82.78
DIRE	85.09	85.09	85.09	85.09	93.03
De-Fake	85.53	85.56	85.53	85.53	92.58
ZeroFake	65.83	67.33	65.83	65.08	74.70
LaRE	90.42	90.47	90.42	90.42	97.00
CLIPping (PT)	87.01	87.46	87.02	86.97	95.11
CLIPping (LP)	86.42	87.94	86.42	86.28	95.78
LOTA	52.89	53.93	52.88	49.49	55.88
SAFE	65.33	65.59	65.34	65.19	72.07
CLIP-SegFusion (Ours)	90.47	90.55	90.47	90.46	96.88

Evaluation Results on OOD Dataset - F1 (%) / AUC (%)

Model	Latent Diffusion	Midjourney	DALLE	Imagen	Janus	StyleGAN2
AutoGAN	60.69 / 70.38	44.94 / 52.52	41.92 / 49.54	43.65 / 52.91	72.18 / 80.92	44.86 / 56.69
DIRE	51.03 / 57.51	66.02 / 84.53	61.08 / 81.12	37.84 / 60.11	68.08 / 85.65	33.78 / 44.64
De-Fake	53.74 / 54.69	61.95 / 80.40	53.52 / 70.39	39.81 / 56.50	63.81 / 82.10	72.10 / 84.78
ZeroFake	45.46 / 46.62	37.05 / 34.89	35.66 / 30.86	44.31 / 34.29	68.77 / 82.30	31.24 / 20.39
LaRE	68.43 / 78.37	82.02 / 90.17	71.29 / 79.68	92.40 / 97.76	77.20 / 84.98	55.00 / 70.45
CLIPping (PT)	69.12 / 77.62	52.53 / 79.01	43.45 / 71.85	33.33 / 39.18	74.95 / 94.67	64.01 / 91.57
CLIPping (LP)	76.42 / 84.37	82.45 / 94.38	71.21 / 89.53	61.29 / 87.45	84.02 / 96.15	91.74 / 98.11
LOTA	47.59 / 53.43	62.30 / 67.57	59.57 / 64.56	58.61 / 64.58	68.70 / 73.92	72.26 / 76.25
SAFE	63.66 / 68.66	68.17 / 78.20	64.20 / 73.50	83.44 / 90.57	81.20 / 88.56	58.02 / 67.67
CLIP-SegFusion (Ours)	77.22 / 89.80	87.77 / 95.03	82.96 / 92.48	68.61 / 85.99	91.50 / 97.27	93.93 / 98.51

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
clip		clip
README.md		README.md
architecture.svg		architecture.svg
evaluate.py		evaluate.py
models.py		models.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CLIP-SegFusion: An Attention-Guided Feature Fusion Framework for Multi-Level Detection on AI-Generated Artworks

Architecture

Dependencies

Usage

Training

Evaluation

Arguments

Evaluation Results (%) on ID Dataset (Stable Diffusion)

LA_Inpainting

Inpainting

Evaluation Results on OOD Dataset - F1 (%) / AUC (%)

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

CLIP-SegFusion: An Attention-Guided Feature Fusion Framework for Multi-Level Detection on AI-Generated Artworks

Architecture

Dependencies

Usage

Training

Evaluation

Arguments

Evaluation Results (%) on ID Dataset (Stable Diffusion)

LA_Inpainting

Inpainting

Evaluation Results on OOD Dataset - F1 (%) / AUC (%)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages