Skip to content

A codebase dedicated to exploring multimodal learning approaches by integrating images of host galaxies of supernovae and their corresponding light-curves and spectra.

License

Notifications You must be signed in to change notification settings

ThomasHelfer/multimodal-supernovae

Repository files navigation

Multimodality with supernovae

no alignment

License: MIT Unittest arXiv: 2408.16829 Hugging Face Dataset: multimodal_supernovae

Overview

This codebase is dedicated to exploring different self-supervised pretraining methods. We integrate multimodal data from supernovae light curves with images of their host galaxies. Our goal is to leverage diverse data types to improve the prediction and understanding of astronomical phenomena.

An overview over the the CLIP method and loss link

All data used in this work is available here: link

Paper associated with code link

Our transformer-based model Maven is pretrained on simulated data and finetuned on observations. We compare it with Maven-lite which is directly trained on observations, and a transformer-based supervised classifcation model and regression model.

Installation

Prerequisites

Before installing, ensure you have the following prerequisites:

  • Python 3.8 or higher
  • pip package manager

Steps

  1. Clone the Repository

    Clone the repository to your local machine and navigate into the directory:

    git clone [email protected]:ThomasHelfer/Multimodal-hackathon-2024.git
    cd Multimodal-hackathon-2024.git
  2. Get data

    Unpack the dataset containing supernovae spectra, light curves and host galaxy images:

    git clone https://huggingface.co/datasets/thelfer/multimodal_supernovae
    mv multimodal_supernovae/ZTFBTS* .
    mkdir sim_data && cd sim_data 
    wget https://huggingface.co/datasets/thelfer/multimodal_supernovae/resolve/main/sim_data/ZTF_Pretrain_5Class.hdf5
  3. Install Required Python Packages

    We recommend to set up an virtual enviorment

    virtualenv dev
    source dev/bin/activate

    Install all dependencies listed in the requirements.txt file:

    pip install -r requirements.txt 
  4. Pretrain on simulated data

    Run the pretrain script

    python pretraining_clip_wandb.py pretrain_config/maven_pretrain_config.yaml 
  5. Finetune maven on real data

    Clip finetuning the pretrained model

    python finetune_clip.py configs/maven_finetune.yaml

    the config file uses the path of our pre-trained model, to apply this to your model, please change the path

  6. Train maven-lite

    Run the script

    python script_wandb.py configs/maven-lite.yaml

Setting Up a Hyperparameter Scan with Weights & Biases

  1. Create a Weights & Biases Account

    Sign up for an account at Weights & Biases if you haven't already.
  2. Configure Your Project

    Edit the configuration file to specify your project name. Ensure the name matches the project you create on wandb.ai. You can define sweep parameters within the config file .
  3. Choose important parameters

    In the config file you can choose
    extra_args
      regression: True
    if true, script_wandb.py performs a regression for redshift. Similarly for
    extra_args
      classification: True
    if true, script_wandb.py performs a classification. if neither are true, it will perform a normal clip pretraining. Lastly, for
    extra_args
      pretrain_lc_path: 'path_to_checkpoint/checkpoint.ckpt'
      freeze_backbone_lc: True
    preloads a pretrained model in script_wandb.py or allows to restart a run from a checkpoint for retraining_wandb.py
  4. Run the Sweep Script

    Start the hyperparameter sweep with the following command:
    python script_wandb.py configs/config_grid.yaml 
    Resume a sweep with the following command:
    python script_wandb.py [sweep_id]
  5. API Key Configuration

    The first execution will prompt you for your Weights & Biases API key, which can be found here. Alternatively, you can set your API key as an environment variable, especially if running on a compute node:
    export WANDB_API_KEY=...
  6. View Results

    Monitor and analyze your experiment results on your Weights & Biases project page. wandb.ai

Running a k-fold cross-validation

We can run a k-fold cross validation by defining the variable

 extra_args:
   kfolds: 5 # for strat Crossvaildation

as this can take serially very long, one can choose to split your runs for different submission by just choosing certain folds for each submission

   foldnumber:
     values: [1,2,3]

Calculate performance metrics from models

To calculate the performance of checkpoint files of models, change the folderpath in the file evaluate_models.py and corresponding name. Then simply calculate metrics by running

python evaluate_models.py

About

A codebase dedicated to exploring multimodal learning approaches by integrating images of host galaxies of supernovae and their corresponding light-curves and spectra.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •