Robust Data Pruning under Label Noise via Maximizing Re-labeling Accuracy (NeurIPS 2023, PDF)
by Dongmin Park1, Seola Choi1, Doyoung Kim1, Hwanjun Song1, 2, Jae-Gil Lee1.
1 KAIST, 2 Amazon AWS AI
Sep 22, 2023
: Our work is accepted at NeurIPS 2023.
- Prune4ReL is a new data pruning method for Re-labeling models (e.g., DivideMix & SOP+) showing state-of-the-art performance under label noise.
- Inspired by a re-labeling theory, Prune4ReL finds the desired data subset by maximizing the total reduced neighborhood confidence, thereby maximizing re-labeling & generalization performance.
- With a greedy approximation, Prune4ReL is efficient and scalable to large datasets including Clothing-1M & ImageNet-1K.
- On four real noisy datasets (e.g., CIFAR-10/100N, WebVision, & Clothing-1M), Prune4Rel outperforms data pruning baselines with Re-labeling models by 9.1%, and those with a standard model by 21.6%.
Please follow Table 7 for hyperparameters. For CIFAR-10N dataset with SOP+ as Re-labeling model,
python3 main_label_noise.py --gpu 0 --model 'PreActResNet18' --robust-learner 'SOP' -rc 0.9 -rb 0.1 \
--dataset CIFAR10 --noise-type $noise_type --n-class 10 --lr-u 10 -se 10 --epochs 300 \
--fraction $fraction --selection Prune4Rel --save-log True \
--metric cossim --uncertainty LeastConfidence --tau 0.975 --eta 1 --balance True
More detailed scripts for other datasets can be found in scripts/
folder.
Basically, the script is similar to that of Prune4ReL. For example,
python3 main_label_noise.py --gpu 0 --model 'PreActResNet18' --robust-learner 'SOP' -rc 0.9 -rb 0.1 \
--dataset CIFAR10 --noise-type $noise_type --n-class 10 --lr-u 10 -se 10 --epochs 300 \
--fraction $fraction --selection *$pruning_algorithm* --save-log True \
where *$pruning_algorithm* must be from [Uniform, SmallLoss, Uncertainty, Forgetting, GraNd, ...], each of which is a class name in deep_core/methods/~~.py
.
@article{park2023robust,
title={Robust Data Pruning under Label Noise via Maximizing Re-labeling Accuracy},
author={Park, Dongmin and Choi, Seola and Kim, Doyoung and Song, Hwanjun and Lee, Jae-Gil},
journal={NeurIPS 2023},
year={2023}
}
We thank the DeepCore library, on which we built most of our repo. Hope our project helps extend the open-source library of data pruning.
- DeepCore library [code] : DeepCore: A Comprehensive Library for Coreset Selection in Deep Learning, Guo et al. 2022.