Official implementation of our NeurIPS 2025 (Datasets and Benchmarks Track) paper “The Impact of Coreset Selection on Spurious Correlations and Group Robustness”.
This repository explores how different coreset selection strategies affect bias levels and group robustness across diverse datasets and architectures.
🧩 What’s inside
- 📊 Reproducible experiments and benchmarks
- 🧮 Evaluation and analysis scripts
📄 Paper: The Impact of Coreset Selection on Spurious Correlations and Group Robustness
- Create a conda environment:
conda create -n bias-select python=3.8
conda activate bias-select- Install PyTorch with CUDA support:
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia- Install the deepcore package and its dependencies:
# Install the package in development mode (make sure you're in the repository root directory)
pip install -e .
# Or install dependencies first, then the package
pip install -r requirements.txt
pip install -e .The -e flag installs the package in "editable" mode, which means you can modify the code without reinstalling.
The codebase supports the following datasets:
- Cmnist - Colored MNIST with spurious correlations
- waterbirds - Bird species classification with background bias
- Urbancars_cooccur - Car classification with co-occurrence bias
- Urbancars_bg - Car classification with background bias
- Urbancars_both - Car classification with combined biases
- Nico_95_spurious - Natural Image Classification with Context
- MultiNLI - Natural Language Inference dataset
- Metashift - Dataset for studying distribution shifts
- Civilcomments - Toxic comment classification dataset
- CelebAhair - CelebA dataset with hair color attributes
- Create a data directory:
mkdir -p data- Download and prepare each dataset:
The CMNIST dataset can be downloaded from Google Drive:
- Download the dataset from CMNIST Google Drive Link
- Extract the downloaded
cmnist.zipfile:
unzip cmnist.zip -d data/- The dataset will be extracted to
data/cmnist/with the following structure:
data/cmnist/
├── test/
│ ├── 0/
│ ├── 1/
│ └── ... (classes 2-9)
└── 5pct/
├── align/
│ ├── 0/
│ └── 1/
├── valid/
│ ├── 0/
│ └── 1/
└── conflict/
├── 0/
└── 1/
Follow - https://github.com/kohpangwei/group_DRO/tree/master to generate the waterbirds dataset
Follow - https://github.com/facebookresearch/Whac-A-Mole/tree/main to create the dataset Urbancars_cooccur, Urbancars_bg, Urbancars_both
Follow https://github.com/YyzHarry/SubpopBench/tree/main to download Metashift dataset and generate the metadata.
Follow https://github.com/izmailovpavel/spurious_feature_learning/tree/main to setup the dataset. You will need to install Wilds package for this.
We follow https://github.com/yvsriram/FACTS to set up the dataset. Download the NICO++ dataset as specified by them into './Data/NICO'
Follow https://github.com/izmailovpavel/spurious_feature_learning/tree/main to setup the dataset
Follow https://github.com/kohpangwei/group_DRO#celeba to download the dataset. Then, to use the hair color as the target attribute, we have provided the metadata file at ./deepcore/datasets/metadata.csv
To prepare the labels for any dataset:
python scripts/save_dataset_labels.py <dataset_name>For example:
# For CMNIST
python scripts/save_dataset_labels.py Cmnist
## Running Experiments
### Sample Characterization Scores
To compute sample characterization scores for a dataset, run the corresponding script in the `scripts` directory:
```bash
# For CMNIST dataset
python scripts/run_cmnist.py
# For Waterbirds dataset
python scripts/run_waterbirds.py
# For CelebA dataset
python scripts/run_celeba.pyThese scripts will:
- Train models on the full dataset
- Compute various sample characterization scores (EL2N, Forgetting, Uncertainty, etc.)
- Save the scores in the
resultsdirectory
After computing the sample characterization scores, you can train downstream models on the selected coresets using the corresponding training scripts:
# For CMNIST dataset
python scripts/run_cmnist_train.py
# For Waterbirds dataset
python scripts/run_waterbirds_train.py
# For CelebA dataset
python scripts/run_celeba_train.pyThese training scripts will:
- Load the pre-computed sample characterization scores
- Select coresets based on the specified selection method
- Train models on the selected coresets
- Save the trained models and results
Here's a complete example for the CMNIST dataset:
# Step 1: Compute sample characterization scores
python scripts/run_cmnist.py
# Step 2: Train models on selected coresets
python scripts/run_cmnist_train.pyThe results will be saved in the results directory with appropriate naming conventions for each dataset and selection method.
This codebase is based on DeepCore, a comprehensive library for coreset selection in deep learning. We extend their work to study the robustness impacts of coreset selection methods on various datasets with spurious correlations and distribution shifts.