Max Suppression (MaxSup) retains the desired regularization effect of Label Smoothing (LS) while preserving the intra-class variation in the feature space. This boosts performance on classification and downstream tasks such as linear transfer and image segmentation.
- Improved Feature Representation for Better Transferability
1.1 Qualitative Evaluation
1.2 Quantitative Evaluation - Train Vision Transformer with MaxSup
2.1 Cache Feature for Faster Data Loading (Optional)
2.2 Prepare the Data and Annotation for the Cache Feature - Pretrained Weights
- Training ConvNets with MaxSup
- Visualization of Logit Characteristics
Figure 1: MaxSup mitigates the reduced intra-class variation in Label Smoothing while preserving inter-class separability. Additionally, in Grad-CAM analysis, MaxSup highlights class-discriminative regions more effectively than Label Smoothing.
Figure 2: We visualize the class activation map using GradCAM (Selvaraju et al., 2019) from Deit-Small models trained with MaxSup (2nd row), Label Smoothing (3rd row) and Baseline (4th row). The first row are original images. The results show that MaxSup training with MaxSup can reduce the distraction by non-target class, whereas Label Smoothing increases the model’s vulnerability to interference, causing the model partially or completely focusing on incorrect objects, due to the loss of richer information of individual samples.
Methods | Intra-Class Variation (Train) | Intra-Class Variation (Validation) | Inter-Class Separability (Train) | Inter-Class Separability (Validation) |
---|---|---|---|---|
Baseline | 0.3114 | 0.3313 | 0.4025 | 0.4451 |
Label Smoothing | 0.2632 | 0.2543 | 0.4690 | 0.4611 |
Online Label Smoothing | 0.2707 | 0.2820 | 0.5943 | 0.5708 |
Zipf's Label Smoothing | 0.2611 | 0.2932 | 0.5522 | 0.4790 |
MaxSup | 0.2926 | 0.2998 | 0.5188 | 0.4972 |
Table 1: Quantitative measures of feature representations for inter-class separability (indicating classification performance) and intra-class variation (indicating transferability), computed using ResNet-50 trained on ImageNet-1K. Although all methods reduce intra-class variation compared to the baseline, MaxSup exhibits the least reduction.
Methods | Linear Transfer Val. Acc |
---|---|
Baseline | 0.8143 |
Label Smoothing | 0.7458 |
MaxSup | 0.8102 |
Table 2: The linear transfer performance of different methods, evaluated using multinomial logistic regression with l2 regularization on CIFAR-10. Despite improving ImageNet accuracy, Label Smoothing notably degrades transfer performance.
We adopt Deit as the baseline model, and MaxSup is included in the train_one_epoch
function of engine.py
.
cd Deit
python train_with_MaxSup.sh
To accelerate the data loading procedure, we additionally implemented a feature which caches the compressed ImageNet dataset as a Zip file in RAM (adapted from Swin-Transformer). It significantly reduces data loading time with slow I/O speed and sufficient RAM, e.g., on a cluster in our case. It is activated by additionally providing --cache
as an argument, as shown in the bash script.
-
ZIP Archives
Please run the following commands in the terminal to create the compressed files for the train and validation sets respectively:cd data/ImageNet zip -r train.zip train zip -r val.zip val
-
Mapping Files
Please download thetrain_map.txt
andval_map.txt
in the releases and put them under the same directory:data/ImageNet/ ├── train_map.txt # Training image paths and labels ├── val_map.txt # Validation image paths and labels ├── train.zip # Training images (compressed) └── val.zip # Validation images (compressed)
-
Training Map File (train_map.txt)
- Format:
<class_folder>/<image_filename>\t<class_label>
- Example entries:
ImageNet/train/n03146219/n03146219_8050.JPEG 0 ImageNet/train/n03146219/n03146219_12728.JPEG 0 ImageNet/train/n03146219/n03146219_9736.JPEG 0 ImageNet/train/n03146219/n03146219_22069.JPEG 0 ImageNet/train/n03146219/n03146219_5467.JPEG 0
- Format:
-
Validation Map File (val_map.txt)
- Format:
<image_filename>\t<class_label>
- Example entries:
ILSVRC2012_val_00000001.JPEG 65 ILSVRC2012_val_00000002.JPEG 970 ILSVRC2012_val_00000003.JPEG 230
- Format:
You should make sure:
- Paths include the class folder structure.
- Labels are zero-based integers.
-
Please find the pretrained weights as well as the training log in the releases "checkpoint_deit".
- The image classification results in the main paper refer to
Conv/ffcv
folder. SeeREADME.md
there. - The additional image classification results in the appendix refer to
Conv/common_resnet
. SeeREADME.md
there.