What is Dataset Distillation Learning?

After cloning the repo, download the distilled data and pretrained model from Google Drive. Afterwards, create the conda enviroment:

conda create -n learning python=3.10.12
conda activate learning
pip install -r requirements.txt

Distilled vs. Real Data

To run analyses done in Section 3 of the paper, refer to the two jupyter notebooks in experiment_code/replacement_analysis

Information Captured by Distilled Data

Predictions of models trained on distilled data is similar to models trained with early-stopping

Generate a pool of subset-trained-models and early-stopped-models by running

python experiment_code/agreement_analysis/generate_models.py

Finally, compare and visualize the prediction differences using the jupyter notebook.

Recognition on the distilled data is learned early in the training process

Use jupyter notebook.

Distilled data stores little information beyond what would be learned early in training

Refer to jupyter notebook.

Note: computing Hessian approximation for the whole training data takes a long time (10+ hours on a L40 GPU). Comment out this Hessian computation if there is a lack of compute resources (Hessian calculations on distilled data is notably less resource intensive).

Semantics of Captured Information

Use jupyter notebook to generate the qualitative analysis (Figure 10) of the paper. For quantitative analysis, refer to the LLaVa repo.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
experiment_code		experiment_code
experiment_results		experiment_results
.gitignore		.gitignore
README.md		README.md
data_utils.py		data_utils.py
model_utils.py		model_utils.py
networks.py		networks.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is Dataset Distillation Learning?

Distilled vs. Real Data

Information Captured by Distilled Data

Predictions of models trained on distilled data is similar to models trained with early-stopping

Recognition on the distilled data is learned early in the training process

Distilled data stores little information beyond what would be learned early in training

Semantics of Captured Information

About

Releases

Packages

Languages

princetonvisualai/What-is-Dataset-Distillation-Learning

Folders and files

Latest commit

History

Repository files navigation

What is Dataset Distillation Learning?

Distilled vs. Real Data

Information Captured by Distilled Data

Predictions of models trained on distilled data is similar to models trained with early-stopping

Recognition on the distilled data is learned early in the training process

Distilled data stores little information beyond what would be learned early in training

Semantics of Captured Information

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages