Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call wandb.finish when the tracker is destructed if wandb is in use #191

Merged
merged 1 commit into from
Feb 4, 2025

Conversation

TonyLianLong
Copy link
Contributor

runs always show "crashed" on my wandb, despite finishing successfully. "Crashed" indicates that wandb did not finish sending the "success" signal to the server so the server believes the client was terminated unexpectedly. Furthermore, wandb log is incomplete (last lines missing).

This PR adds a call to wandb.finish when the Tracker was destructed (oftentimes when trainer.fit finished) so that signals are sent to the server and a data sync is performed.

Without this change:
image

With this change:
image

@vermouth1992 vermouth1992 merged commit 483fa8a into volcengine:main Feb 4, 2025
11 checks passed
Chendong98 pushed a commit to Chendong98/verl that referenced this pull request Feb 4, 2025
…olcengine#191)

runs always show "crashed" on my wandb, despite finishing successfully.
"Crashed" indicates that wandb did not finish sending the "success"
signal to the server so the server believes the client was terminated
unexpectedly. Furthermore, wandb log is incomplete (last lines missing).

This PR adds a call to `wandb.finish` when the Tracker was destructed
(oftentimes when `trainer.fit` finished) so that signals are sent to the
server and a data sync is performed.

Without this change:
<img width="526" alt="image"
src="https://github.com/user-attachments/assets/869da24e-c5b8-415c-b15a-bb79c49f96ce"
/>

With this change:
<img width="548" alt="image"
src="https://github.com/user-attachments/assets/16f0a40d-ea3b-48ed-93a4-f40ee01cb7c6"
/>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants