Repository for the paper titled:
On the Robustness of State-of-the-Art Transformers for Sound Event Classification against Black Box Adversarial Attacks
- audio-adversarial-examples
This project represents an effort to evaluate the robustness of state-of-the-art transformer-based models for sound event classification against adversarial attacks. The attacks are performed using two evolutionary algorithms: Particle Swarm Optimization (PSO) and Differential Evolution (DE). We conduct experiments utilizing three deep learning models (BEATs, PaSST, AST) and two benchmark datasets (AudioSet, ESC-50).
To reproduce the experiments or to perform attacks using any of the algorithms, first create a conda environment with python 3.9 by typing
conda create -n adversarial_audio python=3.9Then activate the environment
conda activate adversarial_audioand install the requirements
pip install -r requirements.txtNote: Due to dependency conflicts between some packages, this project uses two separate conda environments. The instructions above set up an environment for running experiments and adversarial attacks on BEATs or PaSST. If you want to run any AST-related scripts you need to install requirements as:
pip install -r requirements_ast.txtNow you ready to go!
We use the pre-trained models to generate audio adversarial attacks by utilizing two optimization algoriths: Particle Swarm Optimization [4], and Differential Evolution [5]. We operate in a black-box setting where the architecture and weights of the model are unknown to the attacker.
To conduct experiments utilizing the BEATs model, please follow these steps:
-
Download Model Weights: Acquire the model weights that have been fine-tuned on the Audioset. These weights can be downloaded from the following link. In our scenario, we conduct experiments using the Fine-tuned BEATs_iter3+ (AS2M) (cpt2) pt file. To conduct experiments with various weights, you must configure the path to the appropriate file within the get_model function, as well as ensure that all corresponding configuration files are properly set for each specific case.
-
Add Weights to Pretrained Models Folder: After downloading, place the weights into the 'pretrained_models directory' within your project.
In order to utilize AST model for attacks, you need to proceed with the following steps:
-
Download Model Weights: Obtain the model weights for AudioSet from link. In our demonstrations, we use the
Full AudioSet, 10 tstride, 10 fstride, without Weight Averaging, Model 3 (0.448 mAP). -
Add Weights to Pretrained Models Folder: After downloading, place the weights into the 'pretrained_models directory' within your project.
To run experiments using the PaSST model, you do not have to download any weights they are already installed and automatically downloaded.
Perturbation Ratio To regulate the amount of perturbation added, it is necessary to adjust the perturbation ratio parameter within the algorithm's parameters dictionary. The perturbation ratio serves as a weight used in our noise initialization method.
SNR Control To generate attacks with a fixed Signal-to-Noise-Ratio (SNR), you need add the desired SNR values to the SNR_norm parameter in the form of a list. In this way, the generated adversarial example will have the specified signal-to-noise ratios.
To generate an adversarial example using PSO you'll need to first initialize the class responsible for making the attack. For example, to produce an adversarial example for a given example.wav file:
from utils.init_utils import init_algorithm, get_model
# Define Algorithm Parameters.
algorithm_hyperparameters = {
"initial_particles": 25,
"max_iters": 15,
"max_inertia_w": 0.9,
"min_inertia_w": 0.1,
"memory_w": 1.2,
"information_w": 1.2,
"perturbation_ratio": 0.5}
# Define hypercategory mapping path.
hypercategory_mapping = "ontologies/hypercategory_from_ontology.json"
# Load the pre-trained model.
model = get_model(model_str="beats",
model_pt_file="pretrained_models/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt",
hypercategory_mapping=hypercategory_mapping)
# Initialize PSO Attacker
PSO_ATTACKER = init_algorithm(algorithm="pso",
model=model,
verbosity=False,
SNR_norm= 5,
hyperparameters=algorithm_hyperparameters,
objective_function="simple_minimization")
# Start the attack / Generate adversarial example
attack_results = PSO_ATTACKER.generate_adversarial_example("example.wav")The variable attack_results is a python dictionary, containing the keys:
-
noise: The waveform of the perturbation. -
adversary: The waveform of the generated adversarial example. -
raw_audio: The original waveform. -
iterations: Number of total iterations performed on the attack. -
success: If the attack succedeed. -
queries: Number of queries to the model. -
inferred_class: The inferred class. -
Final Starting Class Confidence: The confidence of the starting class. -
Final Confidence: Confidence of the inferred class. -
starting_class: The predicted class before attack.
In similar manner you can use the Differential Evolution as an optimization algorithm to generate an adversarial example:
from utils.init_utils import init_algorithm, get_model
# Define Algorithm Parameters.
algorithm_hyperparameters = {
"pop_size": 20,
"iter": 10,
"F": 1.2,
"cr": 0.9,
"perturbation_ratio": 0.5}
# Define hypercategory mapping path.
hypercategory_mapping = "ontologies/hypercategory_from_ontology.json"
# Load the pre-trained model.
model = get_model(model_str="beats",
model_pt_file="pretrained_models/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt",
hypercategory_mapping=hypercategory_mapping)
# Initialize PSO Attacker
DE_ATTACKER = init_algorithm(algorithm="de",
model=model,
verbosity=False,
SNR_norm= 5,
hyperparameters=algorithm_hyperparameters,
objective_function="simple_minimization")
attack_results = DE_ATTACKER.generate_adversarial_example("example.wav")Reminder: If you want to run attack on the AST model, you need to install other dependencies.
To hear the generated example you can use soundfile to store the wav file:
import soundfile as sf
sf.write(file="adversary_example.wav", data=attack_results["adversary"], samplerate=16000, subtype="FLOAT")To reproduce the experiments using the AudioSet dataset first download the validation subset of AudioSet from the following link: https://www.kaggle.com/datasets/zfturbo/audioset-valid. Store all the wav files in a folder named valid_wav and place it inside data folder.
python src/run_attack.py --config_file config/attack_config.yamlNote:
To run the experiments on audioset you need to create an ontology that maps .wav filenames to hypercategories.
This can be achieved by running the create_subset_audioset.py script. This script returns a json file containing the required format.
Parameters of script:
- -hc : Hypercategory mapping found in
ontologies/hypercategory_from_ontology.json - -tl : True labels ontology in
ontologies/audioset_val_true_labels.json - -n : Number of desired samples.
- -t : Target path to store the new ontology.
To run the experiments on the ESC-50 dataset, first download the dataset from https://www.kaggle.com/datasets/mmoreaux/environmental-sound-classification-50. The models need to be finetuned on this dataset, thus run the script:
python src/finetuned_attack.py --config_file config/finetune.yaml[1] BEATs: Audio Pre-Training with Acoustic Tokenizers
[2] AST: Audio Spectrogram Transformer
[3] Efficient Training of Audio Transformers with Patchout
[5] Particle Swarm Optimization Algorithm and Its Applications: A Systematic Review