This codebase is dedicated to exploring different self-supervised pretraining methods. We integrate multimodal data from supernovae light curves with images of their host galaxies. Our goal is to leverage diverse data types to improve the prediction and understanding of astronomical phenomena.
An overview over the the CLIP method and loss link
All data used in this work is available here: link
Paper associated with code link
Our transformer-based model Maven is pretrained on simulated data and finetuned on observations. We compare it with Maven-lite which is directly trained on observations, and a transformer-based supervised classifcation model and regression model.
Before installing, ensure you have the following prerequisites:
- Python 3.8 or higher
- pip package manager
-
Clone the repository to your local machine and navigate into the directory:
git clone [email protected]:ThomasHelfer/Multimodal-hackathon-2024.git cd Multimodal-hackathon-2024.git
-
Unpack the dataset containing supernovae spectra, light curves and host galaxy images:
git clone https://huggingface.co/datasets/thelfer/multimodal_supernovae mv multimodal_supernovae/ZTFBTS* . mkdir sim_data && cd sim_data wget https://huggingface.co/datasets/thelfer/multimodal_supernovae/resolve/main/sim_data/ZTF_Pretrain_5Class.hdf5
-
We recommend to set up an virtual enviorment
virtualenv dev source dev/bin/activate
Install all dependencies listed in the requirements.txt file:
pip install -r requirements.txt
-
Run the pretrain script
python pretraining_clip_wandb.py pretrain_config/maven_pretrain_config.yaml
-
Clip finetuning the pretrained model
python finetune_clip.py configs/maven_finetune.yaml
the config file uses the path of our pre-trained model, to apply this to your model, please change the path
-
Run the script
python script_wandb.py configs/maven-lite.yaml
- Sign up for an account at Weights & Biases if you haven't already.
- Edit the configuration file to specify your project name. Ensure the name matches the project you create on wandb.ai. You can define sweep parameters within the config file .
-
In the config file you can choose
if true, script_wandb.py performs a regression for redshift. Similarly for
extra_args regression: True
if true, script_wandb.py performs a classification. if neither are true, it will perform a normal clip pretraining. Lastly, forextra_args classification: True
preloads a pretrained model in script_wandb.py or allows to restart a run from a checkpoint for retraining_wandb.pyextra_args pretrain_lc_path: 'path_to_checkpoint/checkpoint.ckpt' freeze_backbone_lc: True
-
Start the hyperparameter sweep with the following command:
Resume a sweep with the following command:
python script_wandb.py configs/config_grid.yaml
python script_wandb.py [sweep_id]
-
The first execution will prompt you for your Weights & Biases API key, which can be found here.
Alternatively, you can set your API key as an environment variable, especially if running on a compute node:
export WANDB_API_KEY=...
- Monitor and analyze your experiment results on your Weights & Biases project page. wandb.ai
We can run a k-fold cross validation by defining the variable
extra_args:
kfolds: 5 # for strat Crossvaildation
as this can take serially very long, one can choose to split your runs for different submission by just choosing certain folds for each submission
foldnumber:
values: [1,2,3]
To calculate the performance of checkpoint files of models, change the folderpath in the file evaluate_models.py and corresponding name. Then simply calculate metrics by running
python evaluate_models.py