Be GLAD! Graph-based Learning of Aerosol Dynamics - a Graph Neural Network Simulator Framework

Inspired by Pytorch GNS[1]: https://www.geoelements.org/gns
Multi-dimensional time-changing features
Multi-dimensional node properties
Alternative activation functions
Prediction pipeline
Data transformation pipeline
Output analysis pipeline

[1] Kumar, K. and Vantassel, J., 2022. GNS: A generalizable Graph Neural Network-based simulator for particulate and fluid modeling. arXiv preprint arXiv:2211.10228.

Requirements

For MacOS and Windows use conda and the provided environment.yml:

conda env create -f environment.yml

For Linux, install CUDA in your environment. For CUDA Toolkit 12.1:

https://developer.nvidia.com/cuda-12-1-0-download-archive?target_os=Linux

Choose the appropriate architecture, distribution, version and installer, then follow the instructions. Make sure that your environment is activated.

Or if you are using Google Colab, find out your CUDA:

[In] print(f"PyTorch has version {torch.__version__} with cuda {torch.version.cuda}")
[Out] PyTorch has version 2.1.0+cu121 with cuda 12.1

Then in terminal:

# Install torch geometric
pip install torch-cluster -f https://data.pyg.org/whl/torch-2.1.0+cu121.html
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.1.0+cu121.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-2.1.0+cu121.html
pip install torch-geometric

Make the necessary directories

data, model, output within the gns folder
raw_data, proc_data, shared_data within the chem_data folder

run.py: prepare, train, test and postprocess script

For convenience you may use the run.py script to get some output data to analyze right away.

python run.py

In the script, change as approriate to your input data and experiment:

#### Set these as appropriate:
# PartMC-MOSAIC Data Examples:
raw_data_path = "./chem_data/processed_output_some/"
rollout_dicts = "./chem_data/proc_data/"
npz_path = "./gns/data/"
model_path = "./gns/model/"
rollouts_path = "./gns/output/"
material_properties = ['aero_number', 'BC', 'OC']
particle_chem = ['H2O', 'SO4']
gases = ['H2SO4']
train_steps = 300
scenarios = [0, 1, 3, 8]
total_reps = 0 # repeat one scenario n times

However, you may wish to run each command at a time from the terminal. If so, read on.

Prepare the raw dataset for training

Open your terminal and go to the directory where folders chem_data and gns were placed. Make sure you are in the environment containing the necessary packages.

# Prepare the raw data - assumed to be in specific txt format
python -m chem_data.chemgns --action='prepare' 
       --raw_data_path='<raw-data-path>'  --preped_data_path='<output path for prepared data>' 
       --universe=<integer> --material_properties='material property list' 
       --gases='gas chemistry list' --particle_chem='particle chemistry list' 
       --share_path='<path for sharing files between processes>'

Train the prepared dataset

# Train for the first time
python -m gns.train --data_path='<prepared data path>' --model_path='<model storage path>' 
       --output_path='<rollout storage path>' -ntraining_steps=<integer total steps>
       
# Train some more
python -m gns.train --data_path='<prepared data path>' --model_path='<model storage path>' 
       --output_path='<rollout storage path>' --model_file='model-<last timestep>.pt' 
       --train_state_file='train_state-<last timestep>.pt' -ntraining_steps=<integer total steps>

Test your model on test data

# Create a rollout using the test dataset
python -m gns.train --mode='rollout' 
       --data_path='<prepared data path>' 
       --model_path='<model storage path>' 
       --output_path='<rollout storage path>' 
       --model_file='model-<last timestep>.pt' 
       --train_state_file='train_state-<last timestep>.pt'

Process the rollout for analysis

The rollout dataset needs to be processed for analysis. This is were we bring the data back to a state that makes sense to scientists.

# Prepare the raw data - assumed to be in specific txt format
python -m chem_data.chemgns --action='analyze' 
       --rollout_data_path='<rollout storage path>'   
       --material_properties='material property list' 
       --gases='gas chemistry list' --particle_chem='particle chemistry list' 
       --proc_data_path='<path for rollout dictionaries>' 
       --share_path='<path for sharing files between processes>'

Predict

Have a folder with the initial values for each chemistry in txt format.

# Prepare the raw data for prediction
python -m chem_data.chemgns --action='predict' 
       --raw_data_path='<raw-data-path>'  --preped_data_path='<output path for prepared data>' 
       --universe=<integer> --material_properties='material property list' --gases='gas chemistry list' 
       --particle_chem='particle chemistry list' --share_path='<path for sharing files between processes>'
       
# Predict!
python -m gns.train --mode='predict' --data_path='<prepared data path>' --model_path='<model storage path>' 
       --output_path='<rollout storage path>' --model_file='model-<last timestep>.pt' 
       --train_state_file='train_state-<last timestep>.pt'

Analyze your results

In a python script or notebook, load chem_data.analyze_results:

import chem_data.analyze_results as ar
help(ar)

Help on module chem_data.analyze_results in chem_data:

NAME
    chem_data.analyze_results

FUNCTIONS
    gd_from_vol(vol)
    
    load_rollout_data(path)
        Load pickle rollout files output by GNS.
        Args:
        path: path to the pickle files (default gns/output/), where each file corresponds to a rollout.
        
        Returns:
        dictionary: keys are string names of the rollout files.
    
    mass_concentration(mass_of_particles, aero_number, chem='all')
    
    mean_std_diameter(mass_of_particles)
    
    nmae(truth, pred)
    
    volume(chem, mass)

Example:

N.B.: documentation is a work in progress. Take a look at the notebooks in this repo for examples.

CUDA Troubleshooting

For better runtimes, you will need CUDA

Sometimes, you may get this message:

RuntimeError: ... something something CUDA ...`

Or you get a warning:

UserWarning: CUDA initialization: ...
rank = None, cuda = False
Training step: 0/1000. Loss: 6.990038871765137.
Training step: 1/1000. Loss: 6.946468830108643.
# slower

Simply run the following:

sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
chem_data		chem_data
gns		gns
images		images
.gitignore		.gitignore
README.md		README.md
chem_data_processing_multi_ex.ipynb		chem_data_processing_multi_ex.ipynb
chem_data_processing_sc1.ipynb		chem_data_processing_sc1.ipynb
chem_data_processing_scenario0-more_noise.ipynb		chem_data_processing_scenario0-more_noise.ipynb
chem_data_processing_scenario0.ipynb		chem_data_processing_scenario0.ipynb
chem_data_processing_scenario3.ipynb		chem_data_processing_scenario3.ipynb
environment.yml		environment.yml
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Be GLAD! Graph-based Learning of Aerosol Dynamics - a Graph Neural Network Simulator Framework

Requirements

Make the necessary directories

run.py: prepare, train, test and postprocess script

Prepare the raw dataset for training

Train the prepared dataset

Test your model on test data

Process the rollout for analysis

Predict

Analyze your results

CUDA Troubleshooting

About

Releases

Packages

Languages

pnnl/glad

Folders and files

Latest commit

History

Repository files navigation

Be GLAD! Graph-based Learning of Aerosol Dynamics - a Graph Neural Network Simulator Framework

Requirements

Make the necessary directories

run.py: prepare, train, test and postprocess script

Prepare the raw dataset for training

Train the prepared dataset

Test your model on test data

Process the rollout for analysis

Predict

Analyze your results

CUDA Troubleshooting

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages