This code belongs to the papers:
AAAI 22
Is RobustBench/AutoAttack a suitable Benchmark for Adversarial Robustness?ICML 21
Detecting AutoAttack Perturbations in the Frequency Domain
For this framework, please cite:
@inproceedings{
lorenz2022is,
title={Is AutoAttack/AutoBench a suitable Benchmark for Adversarial Robustness?},
author={Peter Lorenz and Dominik Strassel and Margret Keuper and Janis Keuper},
booktitle={The AAAI-22 Workshop on Adversarial Machine Learning and Beyond},
year={2022},
url={https://openreview.net/forum?id=aLB3FaqoMBs}
}
This repository is an expansion of SpectralAdversarialDefense, but has some new features:
- Automatic logging.
- Several runs can be saved for calculating the variance of the results.
- new attack method: AutoAttack.
- datasets: imagenet32, imagenet64, imagenet128, imagenet, celebahq32, celebahq64, and celebahq128.
- new model: besides VGG-16 we trained a model WideResNet28-10, except for imagenet (used the standard pytorch model.)
- bash scripts: Automatic starts various combination of input parameters
- automatic .csv creation from all results.
This image shows the pipeline from training a model, generating adversarial examples to defend them.
- Training: Models are trained. Pre-trained models are provided (WideResNet28-10: cif10, cif100, imagenet32, imagenet64, imagenet128, celebaHQ32, celebaHQ64, celebaHQ128; WideResNet51-2: ImageNet; VGG16: cif10 and cif100)
- Generate Clean Data: Only correctly classfied samples are stored via
torch.save
. - Attacks: On this clean data severa atttacks can be executed: FGSM, BIM, AutoAttack (Std), PGD, DF and CW.
- Detect Feature: Detectors try to distinguish between attacked and not-attacked images.
- Evaluation Detect: Is the management script for handling several runs and extract the results to one
.csv
file.
- GPUs: A100 (40GB), Titan V (12GB) or GTX 1080 (12GB)
- CUDA 11.1
- Python 3.9.5
- PyTorch 1.9.0
- cuDNN 8.0.5_0
Clone the repository
$ git clone --recurse-submodules /~https://github.com/adverML/SpectralDef_Framework
$ cd SpectralDef_Framework
and install the requirements
$ conda create --name cuda--11-1-1--pytorch--1-9-0 -f requirements.yml
$ conda activate cuda--11-1-1--pytorch--1-9-0
There are two possiblities: Either use our data set with existing adversarial examples (not provided yet), in this case follow the instructions under 'Download' or generate the examples by yourself, by going threw 'Data generation'. For both possibilities conclude with 'Build a detector'.
Download the adversarial examples (not provided yet) and their non-adversarial counterparts as well as the trained VGG-16 networks from: https://www.kaggle.com/j53t3r/weights. Extract the folders for the adversarial examples into /data and the models in the main directory. Afterwards continue with 'Build detector'.
These datasets are supported:
- cifar10
- cifar100
- ImageNet32x32
- ImageNet64x64
- ImageNet128x128
- ImageNet 2012
- CelebaHQ 32x32 64x64 128x128 256x256
Download and copy the weights into data/datasets/
. In case of troubles, adapt the paths in conf/global_settings.py
.
To get the weights for all networks for CIFAR-10 (WideResNet 28-10 is already in this repository) and CIFAR-100, ImageNet and CelebaHQ download:
- Kaggle Download Weights
- Copy the weights into
checkpoint/
.
In case of troubles, adapt the paths in conf/global_settings.py
. You are welcome to create an issue on Github.
Train the VGG16 on CIFAR-10:
$ python train_cif10.py
or on CIFAR-100
$ python train_cif100.py
The following skript will download the CIFAR-10/100 dataset and extract the CIFAR10/100 (imagenet32, imagenet64, imagenet128, celebAHQ32, ...) images, which are correctly classified by the network by running. Use --net cif10 for CIFAR-10 and --net cif100 for CIFAR-100
$ # python generate_clean_data.py -h // for help
$ python generate_clean_data.py --net cif10
Then generate the adversarial examples, argument can be fgsm (Fast Gradient Sign Method), bim (Basic Iterative Method), pgd (Projected Gradient Descent), [new] std (AutoAttack Standard), df (Deepfool), cw (Carlini and Wagner), :
$ # python attack.py -h // for help
$ python attack.py --attack fgsm
First extract the necessary characteristics to train a detector, choose a detector out of InputMFS (BlackBox - BB), InputPFS, LayerMFS (WhiteBox - WB), LayerPFS, LID, Mahalanobis adn an attack argument as before:
######## To Clarify from the Paper
# InputMFS == BlackBox_MFS
# InputPFS == BlackBox_PFS
# LayerMFS == WhiteBox_MFS
# LayerPFS == WhiteBox_PFS
Execute
$ # python extract_characteristics.py -h // for help
$ python extract_characteristics.py --attack fgsm --detector InputMFS
Then, train a classifier on the characteristics for a specific attack and detector:
$ python detect_adversarials.py --attack fgsm --detector InputMFS
At the end of the file evaluation_detection.py
different possibilities are shown:
$ python evaluation_detection.py
Note that: layers=False
for evaluating the detectors after the the right layers are selected.
- For training the VGG-16 on CIFAR-10 we used: /~https://github.com/kuangliu/pytorch-cifar.
- For training on CIFAR-100: /~https://github.com/weiaicunzai/pytorch-cifar100.
- For training on imagenet32 (64 or 128) and celebaHQ32 (64 or 128) /~https://github.com/bearpaw/pytorch-classification.
- For generating the adversarial examples we used the toolbox foolbox: /~https://github.com/bethgelab/foolbox.
- For the LID detector we used: /~https://github.com/xingjunm/lid_adversarial_subspace_detection.
- For the Mahalanobis detector we used: /~https://github.com/pokaxpoka/deep_Mahalanobis_detector.
- For the AutoAttack detector we used:
/~https://github.com/adverML/auto-attack/tree/forspectraldefense. This one is already added as:
git submodule add -b forspectraldefense git@github.com:adverML/auto-attack.git submodules/autoattack
- Other detectors: /~https://github.com/jayaram-r/adversarial-detection.