-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Rank-Zero Printing and Improve Wandb Initialization #16
base: main
Are you sure you want to change the base?
Conversation
useful after crash or when splitting training into multiple jobs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall a nice improvement. I had some small comments related to my use cases.
Okay I requested another review, and from my side this PR should be ready for merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this on multi-gpu and it works great. I do however get a warning when the logger is created, that I think we maybe can avoid.
@joeloskarsson the issue was that wandb is only initialized once the trainer.fit() started. So a nice way to solve this issue is: removing the
|
@leifdenby can I add this PR to the roadmap for v.0.2.0? and then add it as a feature to the changelog as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I know remember that the wandb initialization timing is a bit annoying. But I think this is a great solution! I just don't understand the on_run_end
hook, so want to double check that before we merge.
Also: You can add this to the changelog already now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that wandb.save
retains the whole directory structure relative to the file it is saving, which I don't think we want for the data config file (it has a quite odd api https://docs.wandb.ai/ref/python/save). I took the freedom to push a change so that the data config is saved directly under files
in wandb. Now that there is a bit more logic I also put it in a separate method, and switched the save policy to policy="now"
while at it.
If that looks ok to you I think this is good to go, after input from @leifdenby on how to place this w.r.t. roadmap. If you thing my change is stupid please just overwrite it 😄
LGTM! |
Apologies I have been quite slow reacting to this PR, I merged latest |
Description:
This PR introduces
rank_zero_print
andinit_wandb
utility functions to enhance printing behavior in multi-process environments and streamline Weights and Biases (wandb) initialization for logging and experiment tracking.Changes:
Added
rank_zero_print
function:print
function for selective printing based on process rank.Introduced
init_wandb
function:WandbLogger
for PyTorch Lightning integration.constants.py
file to wandb for reference.Added necessary imports for time, PyTorch Lightning, wandb, and rank-zero utilities.
Benefits:
constants.py
file to wandb.