Maximum-Suppression-Regularization (MaxSup)

Max Suppression (MaxSup) retains the desired regularization effect of Label Smoothing (LS) while preserving the intra-class variation in the feature space. This boosts performance on classification and downstream tasks such as linear transfer and image segmentation.

Improved Feature Representation for Better Transferability
1.1 Qualitative Evaluation
1.2 Quantitative Evaluation
Train Vision Transformer with MaxSup
2.1 Cache Feature for Faster Data Loading (Optional)
2.2 Prepare the Data and Annotation for the Cache Feature
Pretrained Weights
Training ConvNets with MaxSup
Visualization of Logit Characteristics

Improved Feature Representation for Better Transferability

Qualitative Evaluation

Figure 1: MaxSup mitigates the reduced intra-class variation in Label Smoothing while preserving inter-class separability. Additionally, in Grad-CAM analysis, MaxSup highlights class-discriminative regions more effectively than Label Smoothing.

Figure 2: We visualize the class activation map using GradCAM (Selvaraju et al., 2019) from Deit-Small models trained with MaxSup (2nd row), Label Smoothing (3rd row) and Baseline (4th row). The first row are original images. The results show that MaxSup training with MaxSup can reduce the distraction by non-target class, whereas Label Smoothing increases the model’s vulnerability to interference, causing the model partially or completely focusing on incorrect objects, due to the loss of richer information of individual samples.

Quantitative Evaluation

Methods	Intra-Class Variation (Train)	Intra-Class Variation (Validation)	Inter-Class Separability (Train)	Inter-Class Separability (Validation)
Baseline	0.3114	0.3313	0.4025	0.4451
Label Smoothing	0.2632	0.2543	0.4690	0.4611
Online Label Smoothing	0.2707	0.2820	0.5943	0.5708
Zipf's Label Smoothing	0.2611	0.2932	0.5522	0.4790
MaxSup	0.2926	0.2998	0.5188	0.4972

Table 1: Quantitative measures of feature representations for inter-class separability (indicating classification performance) and intra-class variation (indicating transferability), computed using ResNet-50 trained on ImageNet-1K. Although all methods reduce intra-class variation compared to the baseline, MaxSup exhibits the least reduction.

Methods	Linear Transfer Val. Acc
Baseline	0.8143
Label Smoothing	0.7458
MaxSup	0.8102

Table 2: The linear transfer performance of different methods, evaluated using multinomial logistic regression with l2 regularization on CIFAR-10. Despite improving ImageNet accuracy, Label Smoothing notably degrades transfer performance.

Train Vision Transformer with MaxSup

We adopt Deit as the baseline model, and MaxSup is included in the train_one_epoch function of engine.py.

cd Deit
python train_with_MaxSup.sh

Cache Feature for Faster Data Loading (Optional)

To accelerate the data loading procedure, we additionally implemented a feature which caches the compressed ImageNet dataset as a Zip file in RAM (adapted from Swin-Transformer). It significantly reduces data loading time with slow I/O speed and sufficient RAM, e.g., on a cluster in our case. It is activated by additionally providing --cache as an argument, as shown in the bash script.

Prepare the Data and Annotation for the Cache Feature

ZIP Archives
Please run the following commands in the terminal to create the compressed files for the train and validation sets respectively:
```
cd data/ImageNet
zip -r train.zip train
zip -r val.zip val
```

Mapping Files
Please download the train_map.txt and val_map.txt in the releases and put them under the same directory:

data/ImageNet/
├── train_map.txt      # Training image paths and labels
├── val_map.txt        # Validation image paths and labels
├── train.zip          # Training images (compressed)
└── val.zip            # Validation images (compressed)

Training Map File (train_map.txt)

Format: <class_folder>/<image_filename>\t<class_label>

Example entries:

ImageNet/train/n03146219/n03146219_8050.JPEG    0
ImageNet/train/n03146219/n03146219_12728.JPEG   0
ImageNet/train/n03146219/n03146219_9736.JPEG    0
ImageNet/train/n03146219/n03146219_22069.JPEG   0
ImageNet/train/n03146219/n03146219_5467.JPEG    0

Validation Map File (val_map.txt)

Format: <image_filename>\t<class_label>

Example entries:

ILSVRC2012_val_00000001.JPEG    65
ILSVRC2012_val_00000002.JPEG    970
ILSVRC2012_val_00000003.JPEG    230

You should make sure:

Paths include the class folder structure.
Labels are zero-based integers.

Pretrained Weights

Please find the pretrained weights as well as the training log in the releases "checkpoint_deit".

Training ConvNets with MaxSup

The image classification results in the main paper refer to Conv/ffcv folder. See README.md there.
The additional image classification results in the appendix refer to Conv/common_resnet. See README.md there.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
Conv		Conv
Deit		Deit
viz		viz
zip_training_converter		zip_training_converter
.gitignore		.gitignore
Improved_Feature.png		Improved_Feature.png
README.md		README.md
gradcam.png		gradcam.png
logit.png		logit.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Maximum-Suppression-Regularization (MaxSup)

Table of Contents

Improved Feature Representation for Better Transferability

Qualitative Evaluation

Quantitative Evaluation

Train Vision Transformer with MaxSup

Cache Feature for Faster Data Loading (Optional)

Prepare the Data and Annotation for the Cache Feature

Pretrained Weights

Training ConvNets with MaxSup

About

Uh oh!

Releases 1

Packages

Languages

DylanLIiii/Maximum-Suppression-Regularization

Folders and files

Latest commit

History

Repository files navigation

Maximum-Suppression-Regularization (MaxSup)

Table of Contents

Improved Feature Representation for Better Transferability

Qualitative Evaluation

Quantitative Evaluation

Train Vision Transformer with MaxSup

Cache Feature for Faster Data Loading (Optional)

Prepare the Data and Annotation for the Cache Feature

Pretrained Weights

Training ConvNets with MaxSup

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages