After cloning the repo, download the distilled data and pretrained model from Google Drive. Afterwards, create the conda enviroment:
conda create -n learning python=3.10.12
conda activate learning
pip install -r requirements.txt
To run analyses done in Section 3 of the paper, refer to the two jupyter notebooks in experiment_code/replacement_analysis
Generate a pool of subset-trained-models and early-stopped-models by running
python experiment_code/agreement_analysis/generate_models.py
Finally, compare and visualize the prediction differences using the jupyter notebook.
Use jupyter notebook.
Refer to jupyter notebook.
Note: computing Hessian approximation for the whole training data takes a long time (10+ hours on a L40 GPU). Comment out this Hessian computation if there is a lack of compute resources (Hessian calculations on distilled data is notably less resource intensive).
Use jupyter notebook to generate the qualitative analysis (Figure 10) of the paper. For quantitative analysis, refer to the LLaVa repo.