Skip to content
/ OPA Public

Research on developing and testing defenses to fortify neural networks against One-Pixel Attacks.

License

Notifications You must be signed in to change notification settings

Rbholika/OPA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fortification Against One‑Pixel Attack (OPA)

This repository contains code, experiments, and analysis for studying defenses (fortification) against the One‑Pixel Attack (OPA) — an adversarial attack that modifies only a single pixel in an image to cause misclassification. The goal of this project is to evaluate, develop, and compare practical mitigation strategies that reduce the vulnerability of image classifiers to highly sparse adversarial perturbations.

This README is written for contributors and researchers who want to reproduce experiments, extend defenses, or benchmark models against one‑pixel attacks.

Table of Contents

  • Overview
  • Key Contributions
  • Implemented Defenses (Fortification Methods)
  • Repository Structure
  • Installation & Requirements
  • Quick Start
  • Running Attacks and Defenses
  • Evaluation & Metrics
  • Experiments & Reproducibility
  • Visualization
  • Citation
  • License & Contact

Overview One‑Pixel Attack (Su et al., 2019) highlights that even a single‑pixel change—if chosen adversarially—can cause deep networks to fail. This project focuses not on new attacks, but on fortification strategies: detection, pre‑processing, model training modifications, and certified or statistical defenses that increase robustness to sparse pixel perturbations.

Key Contributions

  • Implementations of multiple fortification techniques geared to one‑pixel / highly sparse attacks.
  • Evaluation harness for automated attack+defense benchmarking (attack success rate, robust accuracy, queries, etc.).
  • Visual analysis and heatmaps showing sensitive pixels and where defenses intervene.
  • Reproducible experiment configs (datasets, models, seeds).

Implemented Defenses (Fortification Methods) The repository includes implementations (or reference wrappers) for the following defenses. Each defense includes a train/eval script and unit tests where applicable.

  • Adversarial Training (sparse): Train models with adversarial examples generated by one‑pixel perturbations or sparse variants.
  • Input Transformation:
    • Median filtering (small kernels) to remove single‑pixel noise.
    • Local smoothing / bilateral filtering.
    • Color quantization and bit‑depth reduction (feature squeezing).
    • JPEG compression / decompression as a denoising step.
  • Pixel Masking & Repair:
    • Detection of outlier pixel values followed by local inpainting or model‑based repair.
  • Denoising Autoencoders / Small CNN Denoisers: Learn to remove localized perturbations while preserving semantics.
  • Randomized Smoothing & Ensembles: Apply randomized input transformations and aggregate predictions for improved certified or empirical robustness.
  • Detection + Reject: Lightweight detectors that flag suspicious inputs for rejection or further analysis.
  • Hybrid Methods: Combination of detection + repair + robust training.

If you add new methods, please follow the coding conventions in src/defenses/ and add configuration to experiments/.

Repository Structure

  • README.md — this file
  • src/ — implementation for attacks, defenses, model wrappers, datasets, utils
    • src/attacks/ — one‑pixel attack implementations and utilities
    • src/defenses/ — implemented fortification methods
    • src/models/ — model wrappers (PyTorch / TF)
    • src/eval/ — evaluation metrics and logging
  • experiments/ — experiment configurations (YAML/JSON)
  • notebooks/ — interactive analysis and visualization notebooks
  • data/ — dataset download scripts or pointers (CIFAR‑10, Tiny‑ImageNet, custom examples)
  • models/ — model checkpoints (or scripts to download)
  • results/ — raw experiment outputs (JSON, images, logs)
  • requirements.txt — Python dependencies
  • environment.yml — (optional) Conda environment
  • LICENSE — license text

Installation & Requirements

  • Python 3.8+
  • Recommended: GPU and CUDA for model training and faster evaluation
  • Minimal packages (listed in requirements.txt): numpy, torch, torchvision (or tensorflow), opencv-python, pillow, tqdm, matplotlib, seaborn, scikit-learn, pandas

Install:

git clone https://github.com/Rbholika/OPA.git
cd OPA
pip install -r requirements.txt

Quick Start (example)

  1. Download dataset (CIFAR‑10 used in examples):
python src/data/download.py --dataset cifar10 --dest data/
  1. Evaluate a baseline model (no defense) with one‑pixel attack on a sample image:
python src/attacks/run_attack.py \
  --model checkpoints/resnet_cifar10.pt \
  --image examples/dog.png \
  --attack one_pixel \
  --targeted False \
  --out results/dog_attack.json
  1. Run a defense pipeline (median filter + model prediction):
python src/defenses/run_defense_pipeline.py \
  --model checkpoints/resnet_cifar10.pt \
  --image examples/dog.png \
  --defense median_filter \
  --kernel 3 \
  --out results/dog_defense.json
  1. Run an end‑to‑end benchmark (attacks vs. defenses on a dataset):
python src/benchmarks/benchmark.py \
  --model checkpoints/resnet_cifar10.pt \
  --dataset data/cifar10/test/ \
  --defenses median_filter,bitdepth_quantize,adv_train_sparse \
  --n_samples 1000 \
  --out results/benchmark_cifar10.json

Running Attacks and Defenses

  • Attacks are located in src/attacks/. The one‑pixel attack is implemented as a GA optimizer altering one pixel (x,y,color). CLI options let you set population size, iterations, targeted/untargeted, and random seed.
  • Defenses are in src/defenses/. Each defense exposes a common API: preprocess(image) -> image, detect(image) -> score/bool, repair(image) -> image. Use the defense runner to compose multiple defenses.
  • Benchmarks call attack + defense automatically and log:
    • raw model prediction before defense
    • defended prediction after preprocessing/repair
    • whether attack succeeded (fooling the defended model)
    • queries used and runtime

Evaluation & Metrics

  • Attack Success Rate (ASR): fraction of attacked images that lead to misclassification.
  • Robust Accuracy: accuracy on attacked images after defense.
  • True Positive / False Positive for detectors.
  • Average queries: number of model queries used per successful attack.
  • Perturbation magnitude: L0 (number of altered pixels), L2, L∞ where applicable.
  • Wall‑clock runtime per sample.

Results are saved as structured JSON (see results/) and can be visualized with notebooks in notebooks/.

Experiments & Reproducibility

  • All experiments in experiments/ include a config (dataset, model, defense, attack params). Example:
    • experiments/cifar10/median_filter.yml
  • To reproduce an experiment:
    1. Ensure dataset and model checkpoints are downloaded.
    2. Run:
python src/experiments/run_experiment.py --config experiments/cifar10/median_filter.yml
  • Use --seed for deterministic runs and record seeds in logs.
  • Logs include model version and git commit hash when available.

Visualization

  • Notebooks illustrate:
    • Original vs adversarial vs repaired images with the changed pixel highlighted.
    • Pixel sensitivity heatmaps (which pixels cause the highest misclassification rates).
    • Defense comparisons: ASR and robust accuracy per defense and per model.
  • Save example visualizations to results/figures/.

Extending the Project

  • Add a defense: implement a class in src/defenses/ following the base template and add a configuration to experiments/.
  • Add a model: add a wrapper in src/models/ that exposes predict() and preprocess().

Citation Please cite the original One‑Pixel Attack when using this work:

  • Su, Jiawei; Vasconcellos Vargas, Danilo; Sakurai, Kouichi. "One Pixel Attack for Fooling Deep Neural Networks." IEEE Transactions on Evolutionary Computation (2019). arXiv:1710.08864

If you use the fortification code or experiments in publications, cite this repository and relevant papers for each defense (for instance: Feature‑Squeezing, Randomized Smoothing papers, etc.) — add the precise references you used in your experiments.

License This project is released under the MIT License — see LICENSE for details.

Contact & Contributions

  • Owner / Maintainer: @Rbholika
  • Issues and PRs welcome — please include reproducible experiment configs and test results.

I displayed your original README snippet above so it stays visible and then drafted a replacement README tailored specifically to fortification against one‑pixel attacks. I included a clear structure for defenses, evaluation, and reproducibility, and added example CLI commands you can adapt. If you want, I can now:

  • Update this README directly in the repository on a branch (tell me which branch), or
  • Re-run a repository read to tailor the README exactly to the files present and then commit.

Tell me which you'd prefer and I will proceed to update the repo accordingly.

About

Research on developing and testing defenses to fortify neural networks against One-Pixel Attacks.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published