This repository contains code, experiments, and analysis for studying defenses (fortification) against the One‑Pixel Attack (OPA) — an adversarial attack that modifies only a single pixel in an image to cause misclassification. The goal of this project is to evaluate, develop, and compare practical mitigation strategies that reduce the vulnerability of image classifiers to highly sparse adversarial perturbations.
This README is written for contributors and researchers who want to reproduce experiments, extend defenses, or benchmark models against one‑pixel attacks.
Table of Contents
- Overview
- Key Contributions
- Implemented Defenses (Fortification Methods)
- Repository Structure
- Installation & Requirements
- Quick Start
- Running Attacks and Defenses
- Evaluation & Metrics
- Experiments & Reproducibility
- Visualization
- Citation
- License & Contact
Overview One‑Pixel Attack (Su et al., 2019) highlights that even a single‑pixel change—if chosen adversarially—can cause deep networks to fail. This project focuses not on new attacks, but on fortification strategies: detection, pre‑processing, model training modifications, and certified or statistical defenses that increase robustness to sparse pixel perturbations.
Key Contributions
- Implementations of multiple fortification techniques geared to one‑pixel / highly sparse attacks.
- Evaluation harness for automated attack+defense benchmarking (attack success rate, robust accuracy, queries, etc.).
- Visual analysis and heatmaps showing sensitive pixels and where defenses intervene.
- Reproducible experiment configs (datasets, models, seeds).
Implemented Defenses (Fortification Methods) The repository includes implementations (or reference wrappers) for the following defenses. Each defense includes a train/eval script and unit tests where applicable.
- Adversarial Training (sparse): Train models with adversarial examples generated by one‑pixel perturbations or sparse variants.
- Input Transformation:
- Median filtering (small kernels) to remove single‑pixel noise.
- Local smoothing / bilateral filtering.
- Color quantization and bit‑depth reduction (feature squeezing).
- JPEG compression / decompression as a denoising step.
- Pixel Masking & Repair:
- Detection of outlier pixel values followed by local inpainting or model‑based repair.
- Denoising Autoencoders / Small CNN Denoisers: Learn to remove localized perturbations while preserving semantics.
- Randomized Smoothing & Ensembles: Apply randomized input transformations and aggregate predictions for improved certified or empirical robustness.
- Detection + Reject: Lightweight detectors that flag suspicious inputs for rejection or further analysis.
- Hybrid Methods: Combination of detection + repair + robust training.
If you add new methods, please follow the coding conventions in src/defenses/ and add configuration to experiments/.
Repository Structure
README.md— this filesrc/— implementation for attacks, defenses, model wrappers, datasets, utilssrc/attacks/— one‑pixel attack implementations and utilitiessrc/defenses/— implemented fortification methodssrc/models/— model wrappers (PyTorch / TF)src/eval/— evaluation metrics and logging
experiments/— experiment configurations (YAML/JSON)notebooks/— interactive analysis and visualization notebooksdata/— dataset download scripts or pointers (CIFAR‑10, Tiny‑ImageNet, custom examples)models/— model checkpoints (or scripts to download)results/— raw experiment outputs (JSON, images, logs)requirements.txt— Python dependenciesenvironment.yml— (optional) Conda environmentLICENSE— license text
Installation & Requirements
- Python 3.8+
- Recommended: GPU and CUDA for model training and faster evaluation
- Minimal packages (listed in requirements.txt): numpy, torch, torchvision (or tensorflow), opencv-python, pillow, tqdm, matplotlib, seaborn, scikit-learn, pandas
Install:
git clone https://github.com/Rbholika/OPA.git
cd OPA
pip install -r requirements.txtQuick Start (example)
- Download dataset (CIFAR‑10 used in examples):
python src/data/download.py --dataset cifar10 --dest data/- Evaluate a baseline model (no defense) with one‑pixel attack on a sample image:
python src/attacks/run_attack.py \
--model checkpoints/resnet_cifar10.pt \
--image examples/dog.png \
--attack one_pixel \
--targeted False \
--out results/dog_attack.json- Run a defense pipeline (median filter + model prediction):
python src/defenses/run_defense_pipeline.py \
--model checkpoints/resnet_cifar10.pt \
--image examples/dog.png \
--defense median_filter \
--kernel 3 \
--out results/dog_defense.json- Run an end‑to‑end benchmark (attacks vs. defenses on a dataset):
python src/benchmarks/benchmark.py \
--model checkpoints/resnet_cifar10.pt \
--dataset data/cifar10/test/ \
--defenses median_filter,bitdepth_quantize,adv_train_sparse \
--n_samples 1000 \
--out results/benchmark_cifar10.jsonRunning Attacks and Defenses
- Attacks are located in
src/attacks/. The one‑pixel attack is implemented as a GA optimizer altering one pixel (x,y,color). CLI options let you set population size, iterations, targeted/untargeted, and random seed. - Defenses are in
src/defenses/. Each defense exposes a common API: preprocess(image) -> image, detect(image) -> score/bool, repair(image) -> image. Use the defense runner to compose multiple defenses. - Benchmarks call attack + defense automatically and log:
- raw model prediction before defense
- defended prediction after preprocessing/repair
- whether attack succeeded (fooling the defended model)
- queries used and runtime
Evaluation & Metrics
- Attack Success Rate (ASR): fraction of attacked images that lead to misclassification.
- Robust Accuracy: accuracy on attacked images after defense.
- True Positive / False Positive for detectors.
- Average queries: number of model queries used per successful attack.
- Perturbation magnitude: L0 (number of altered pixels), L2, L∞ where applicable.
- Wall‑clock runtime per sample.
Results are saved as structured JSON (see results/) and can be visualized with notebooks in notebooks/.
Experiments & Reproducibility
- All experiments in
experiments/include a config (dataset, model, defense, attack params). Example:experiments/cifar10/median_filter.yml
- To reproduce an experiment:
- Ensure dataset and model checkpoints are downloaded.
- Run:
python src/experiments/run_experiment.py --config experiments/cifar10/median_filter.yml- Use
--seedfor deterministic runs and record seeds in logs. - Logs include model version and git commit hash when available.
Visualization
- Notebooks illustrate:
- Original vs adversarial vs repaired images with the changed pixel highlighted.
- Pixel sensitivity heatmaps (which pixels cause the highest misclassification rates).
- Defense comparisons: ASR and robust accuracy per defense and per model.
- Save example visualizations to
results/figures/.
Extending the Project
- Add a defense: implement a class in
src/defenses/following the base template and add a configuration toexperiments/. - Add a model: add a wrapper in
src/models/that exposespredict()andpreprocess().
Citation Please cite the original One‑Pixel Attack when using this work:
- Su, Jiawei; Vasconcellos Vargas, Danilo; Sakurai, Kouichi. "One Pixel Attack for Fooling Deep Neural Networks." IEEE Transactions on Evolutionary Computation (2019). arXiv:1710.08864
If you use the fortification code or experiments in publications, cite this repository and relevant papers for each defense (for instance: Feature‑Squeezing, Randomized Smoothing papers, etc.) — add the precise references you used in your experiments.
License
This project is released under the MIT License — see LICENSE for details.
Contact & Contributions
- Owner / Maintainer: @Rbholika
- Issues and PRs welcome — please include reproducible experiment configs and test results.
I displayed your original README snippet above so it stays visible and then drafted a replacement README tailored specifically to fortification against one‑pixel attacks. I included a clear structure for defenses, evaluation, and reproducibility, and added example CLI commands you can adapt. If you want, I can now:
- Update this README directly in the repository on a branch (tell me which branch), or
- Re-run a repository read to tailor the README exactly to the files present and then commit.
Tell me which you'd prefer and I will proceed to update the repo accordingly.