Revisiting Adversarial Training for ImageNet: Architectures, Training and Generalization across Threat Models
While adversarial training has been extensively studied for ResNet architectures and low resolution datasets like CIFAR, much less is known for ImageNet. Given the recent debate about whether transformers are more robust than convnets, we revisit adversarial training on ImageNet comparing ViTs and ConvNeXts. Extensive experiments show that minor changes in architecture, most notably replacing PatchStem with ConvStem, and training scheme have a significant impact on the achieved robustness. These changes not only increase robustness in the seen
Requirements (specific versions tested on):
fastargs-1.2.0
autoattack-0.1
pytorch-1.13.1
torchvision-0.14.1
robustbench-1.1
timm-0.8.0.dev0
, GPUtil
The bash script in run_train.sh
trains the model model.arch
. For clean training: adv.attack none
and for adversarial training set adv.attack apgd
.
For the standard setting as in the paper (heavy augmentations) set data.augmentations 1
, model.model_ema 1
and training.label_smoothing 1
.
To train models with Convolution-Stem (CvSt) set model.not_original 1
.
The code does standard APGD adversarial training.
The file utils_architecture.py
has model definitions for the new CvSt
models, all models are built on top of timm imports.
The file runner_aa_eval
runs AutoAttack
(AA). Passing fullaa 1
runs complete AA whereas fullaa 0
runs the first two attacks (APGD-CE and APGD-T) in AA.
The link location includes weights for the clean model (the one used as initialization for Adversarial Training (AT)), the robust model, and the full-AA
log for
Note: the higher resolution numbers use the same checkpoint as for the standard resolution of 224 - only evaluation is done at the higher resolution mentioned.
Model-Name | epochs | res. | Clean acc. | AA - |
Checkpoint (clean-init and robust) |
---|---|---|---|---|---|
ConvNext-iso-CvSt | 300 | 224 | 70.2 | 45.9 | Link |
ViT-S | 300 | 224 | 69.2 | 44.0 | Link |
ViT-S-CvSt | 300 | 224 | 72.5 | 48.1 | Link |
ConvNext-T | 300 | 224 | 72.4 | 48.6 | Link |
ConvNext-T-CvSt | 300 | 224 | 72.7 | 49.5 | Link |
ViT-M-CvSt | 50 | 224 | 72.4 | 48.8 | Link |
ConvNext-S-CvSt | 50 | 224 | 74.1 | 52.4 | Link |
ViT-B | 50 | 224 | 73.3 | 50.0 | Link |
ConvNext-B | 50 | 224 | 75.6 | 54.3 | Link |
ViT-B-CvSt | 250 | 224 | 76.3 | 54.7 | Link |
ConvNext-B-CvSt | 250 | 224 | 75.9 | 56.1 | Link |
ConvNext-B-CvSt* | --- | 256 | 76.9 | 57.3 | Link |
ConvNext-L-CvSt | 100 | 224 | 77.0 | 57.7 | Link |
ConvNext-L-CvSt* | --- | 320 | 78.2 | 59.4 | Link |
Checkpoints along with accuracy and robustness logs for ImageNet models finetuned to be robust at
If you use our code/models cite our work using the following BibTex entry:
@inproceedings{singh2023revisiting,
title={Revisiting Adversarial Training for ImageNet: Architectures, Training and Generalization across Threat Models},
author={Singh, Naman D and Croce, Francesco and Hein, Matthias},
booktitle={NeurIPS},
year={2023}}