This repository contains the code to reproduce our experiments with the conformal CANN detector, our method for detecting adversarial examples based on conformal prediction [1] and correspondence analysis [2].
Our experiments were designed for the MNIST, Fashion-MNIST, CIFAR-10 and SVHN data sets. All of these are included in the TensorFlow Keras framework upon which our code is built, except for SVHN. You can download this data set from this URL. Note that we used 32x32 cropped digits (format 2).
To run the code, you will need to make sure all dependencies are properly installed. There is a requirements.txt
file provided to facilitate this, assuming you have Python 3.6.9 or higher:
pip install -r requirements.txt
There are two main scripts provided for running our experiments: evaluate.py
and attack.py
. The former will train baseline models and CANNs from scratch and evaluate them against the
An example command is to run the evaluation is shown below:
python evaluate.py mnist ResNet50 --eps .3
This will train a ResNet50 and CANN model for the MNIST data set and evaluate it against the PGD attack up to a relative perturbation budget of 30%. When this script is completed, it will generate a results.json
file in the results
folder that looks like this:
{
"baseline_acc": "0.99292517",
"adversarial_acc": "0.010884354",
"center_score": "0.07354409396427587",
"threshold": "0.10857142857142857",
"auroc": "0.8353615261869876",
"trr": "0.7867385960120186",
"frr": "0.10517755489292491",
"detection_acc": "0.8409863945578231"
}
This JSON dump has a number of fields:
-
baseline_acc
. The clean accuracy achieved by the baseline model on the test set. -
adversarial_acc
. The robust accuracy achieved by the baseline model on adversarial examples generated with the$L_{\inf}$ PGD attack at the specified perturbation budget (30% of the total pixel range in this example). -
center_score
. The mean deviation value as defined in the paper. -
threshold
. The non-conformity threshold of the detector. -
auroc
. The area under the ROC curve achieved by the detector. Note that this value is independent of any tuned threshold. -
trr
. The true rejection rate as defined in the paper. Depends on the tuned threshold. -
frr
. The false rejection rate as defined in the paper. Depends on the tuned threshold. -
detection_acc
. Accuracy of the detector. Depends on the tuned threshold.
This JSON file contains important information that is necessary for the adaptive attack to work, so the evaluation script must be run prior to the adaptive attack.
When evaluate.py
has finished, you can run the adaptive adversarial attack script attack.py
as follows:
python attack.py mnist ResNet50 --eps .3
This will run the adaptive attack againt the pre-trained ResNet50 and CANN models for the MNIST data set up to an center_score
and threshold
values produced by evaluate.py
. It will then produce a JSON file results_adaptive.json
reporting the same statistics as the evaluation script.
We have also implemented the Deep KNN [4] and the Mahalanobis distance-based detector [5] for comparison. These can be run using the deepknn.py
and mahalanobis.py
scripts respectively using the same interface as the evaluation and adaptive attack scripts. Running them produces JSON files results_deepknn.json
and results_mahalanobis.json
with the same metrics as before.
To reproduce our results exactly, you can also run the enclosed bash shell scripts mnist.sh
, fashion.sh
, cifar10.sh
and svhn.sh
respectively for the MNIST, Fashion-MNIST, CIFAR-10 and SVHN data sets. The script all.sh
will run all of these in sequence.
You can run our experiments against your own models and data sets by supplying a model specification, a CANN architecture and a data provider. To do this, follow these steps:
- Create a Python script under
./datasets/dataset_name.py
which implements theload_data
,create_cann
andtrain_cann
functions. You can probably copy thetrain_cann
function as-is from existing code unless you need special procedures to fit your CANN. - Create a Python script under
./models/model_name.py
which implements thecreate_model
andtrain_baseline
functions. - You can now run
evaluate.py
andattack.py
on your own data set and models.
- Shafer, G., & Vovk, V. (2008). A tutorial on conformal prediction. Journal of Machine Learning Research, 9(Mar), 371-421. PDF
- Hsu, H., Salamatian, S., & Calmon, F. P. (2019). Correspondence analysis using neural networks. arXiv preprint arXiv:1902.07828. PDF
- Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083. PDF
- Papernot, N., & McDaniel, P. (2018). Deep k-nearest neighbors: Towards confident, interpretable and robust deep learning. arXiv preprint arXiv:1803.04765. PDF
- Lee, K., Lee, K., Lee, H., & Shin, J. (2018). A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In Advances in Neural Information Processing Systems (pp. 7167-7177). PDF