Add Rank-Zero Printing and Improve Wandb Initialization #16

sadamov · 2024-05-01T20:03:01Z

Description:
This PR introduces rank_zero_print and init_wandb utility functions to enhance printing behavior in multi-process environments and streamline Weights and Biases (wandb) initialization for logging and experiment tracking.

Changes:

Added rank_zero_print function:
- Ensures printing only occurs from rank 0 process in multi-process setup.
- Wraps print function for selective printing based on process rank.
Introduced init_wandb function:
- Initializes wandb based on provided arguments.
- Handles new runs and resuming existing runs.
- Generates meaningful run names with relevant information.
- Creates WandbLogger for PyTorch Lightning integration.
- Saves constants.py file to wandb for reference.
Added necessary imports for time, PyTorch Lightning, wandb, and rank-zero utilities.

Benefits:

Prevents duplicate or conflicting output in multi-process environments.
Simplifies wandb setup and configuration for experiment tracking and logging.
Improves organization and identification of experiments with informative run names.
Enhances reproducibility by saving constants.py file to wandb.

useful after crash or when splitting training into multiple jobs

joeloskarsson

Overall a nice improvement. I had some small comments related to my use cases.

neural_lam/utils.py

sadamov · 2024-05-25T14:54:17Z

Okay I requested another review, and from my side this PR should be ready for merge.

joeloskarsson

I tested this on multi-gpu and it works great. I do however get a warning when the logger is created, that I think we maybe can avoid.

neural_lam/utils.py

sadamov · 2024-06-06T15:26:14Z

@joeloskarsson the issue was that wandb is only initialized once the trainer.fit() started. So a nice way to solve this issue is: removing the wandb.init() as you suggested and then saving the configs in the ar_model:

    def on_run_end(self):
        if self.trainer.is_global_zero:
            wandb.save("neural_lam/data_config.yaml")

sadamov · 2024-06-06T15:28:25Z

@leifdenby can I add this PR to the roadmap for v.0.2.0? and then add it as a feature to the changelog as well?

joeloskarsson

Right, I know remember that the wandb initialization timing is a bit annoying. But I think this is a great solution! I just don't understand the on_run_end hook, so want to double check that before we merge.

Also: You can add this to the changelog already now.

neural_lam/models/ar_model.py

joeloskarsson

I realized that wandb.save retains the whole directory structure relative to the file it is saving, which I don't think we want for the data config file (it has a quite odd api https://docs.wandb.ai/ref/python/save). I took the freedom to push a change so that the data config is saved directly under files in wandb. Now that there is a bit more logic I also put it in a separate method, and switched the save policy to policy="now" while at it.

If that looks ok to you I think this is good to go, after input from @leifdenby on how to place this w.r.t. roadmap. If you thing my change is stupid please just overwrite it 😄

leifdenby · 2024-08-05T12:20:20Z

LGTM!

sadamov · 2024-09-06T12:37:51Z

Apologies I have been quite slow reacting to this PR, I merged latest main into PR-branch. Okay with @leifdenby if I merge this now into v0.2.0?

Simon Adamov added 3 commits May 1, 2024 20:15

Utilize rank_one_only to init logger and plot

39f5976

new function returns logger directly

96eea78

new flag to resume wandb run

72272bc

useful after crash or when splitting training into multiple jobs

sadamov changed the title ~~Feature rank one utils~~ Add Rank-Zero Printing and Improve Wandb Initialization May 1, 2024

sadamov requested a review from joeloskarsson May 1, 2024 20:21

sadamov self-assigned this May 1, 2024

sadamov added the enhancement New feature or request label May 1, 2024

sadamov requested a review from leifdenby May 14, 2024 05:33

joeloskarsson requested changes May 14, 2024

View reviewed changes

neural_lam/utils.py Show resolved Hide resolved

neural_lam/utils.py Outdated Show resolved Hide resolved

neural_lam/utils.py Outdated Show resolved Hide resolved

neural_lam/utils.py Outdated Show resolved Hide resolved

neural_lam/utils.py Show resolved Hide resolved

sadamov mentioned this pull request May 25, 2024

Feature: Robust restoration of optimizer and scheduler #17

Merged

sadamov added 5 commits May 25, 2024 16:07

combine two wandb init functions

57a396c

Merge remote-tracking branch 'upstream/main' into feature_rank_one_utils

ec796cb

linter

ea43435

adding randint to prevent slurm issues

0ed609a

debug and linting

7f99788

sadamov requested a review from joeloskarsson May 25, 2024 14:53

joeloskarsson requested changes May 29, 2024

View reviewed changes

neural_lam/utils.py Show resolved Hide resolved

neural_lam/utils.py Show resolved Hide resolved

Merge remote-tracking branch 'origin/main' into feature_rank_one_utils

334ab42

remove double wandb initialization

f2a8180

sadamov requested a review from joeloskarsson June 6, 2024 16:05

joeloskarsson requested changes Jun 7, 2024

View reviewed changes

neural_lam/models/ar_model.py Outdated Show resolved Hide resolved

sadamov and others added 3 commits June 7, 2024 12:39

switched hooks for saving config xaml file

d392e53

save user config instead of of default value

2ac2a8b

Do not include directory structure of data config in wandb storage

10835e9

joeloskarsson approved these changes Jun 7, 2024

View reviewed changes

leifdenby approved these changes Aug 5, 2024

View reviewed changes

Merge remote-tracking branch 'mllam/main' into feature_rank_one_utils

ba3fd3a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Rank-Zero Printing and Improve Wandb Initialization #16

Add Rank-Zero Printing and Improve Wandb Initialization #16

sadamov commented May 1, 2024 •

edited

Loading

joeloskarsson left a comment

sadamov commented May 25, 2024

joeloskarsson left a comment

sadamov commented Jun 6, 2024

sadamov commented Jun 6, 2024

joeloskarsson left a comment

joeloskarsson left a comment

leifdenby commented Aug 5, 2024

sadamov commented Sep 6, 2024

Add Rank-Zero Printing and Improve Wandb Initialization #16

Are you sure you want to change the base?

Add Rank-Zero Printing and Improve Wandb Initialization #16

Conversation

sadamov commented May 1, 2024 • edited Loading

joeloskarsson left a comment

Choose a reason for hiding this comment

sadamov commented May 25, 2024

joeloskarsson left a comment

Choose a reason for hiding this comment

sadamov commented Jun 6, 2024

sadamov commented Jun 6, 2024

joeloskarsson left a comment

Choose a reason for hiding this comment

joeloskarsson left a comment

Choose a reason for hiding this comment

leifdenby commented Aug 5, 2024

sadamov commented Sep 6, 2024

sadamov commented May 1, 2024 •

edited

Loading