Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalability report update - Communication Plot [cleaned up] #231

Merged
merged 40 commits into from
Oct 17, 2024

Conversation

matbun
Copy link
Collaborator

@matbun matbun commented Oct 16, 2024

Summary

This PR is focused on improving the scalability report. In particular, this PR is centered around the communication plot, as discussed in #221, and provides two main features:

  1. A decorator that can be put on top of a TorchTrainer instance to do profiling. Any subclass must add self.profiler.step() in their set_epoch() function for this to work as intended.

  2. A CLI command that generates a communication plot in the folder that you are currently in.


Related issue :
This is related to issue #221

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@matbun matbun marked this pull request as ready for review October 16, 2024 17:29
@matbun matbun changed the title make comm profiler into decorator Scalability report update - Communication Plot [cleaned up] Oct 16, 2024
@matbun matbun added the enhancement New feature or request label Oct 16, 2024
Copy link
Collaborator

@jarlsondre jarlsondre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jarlsondre jarlsondre merged commit 30765e0 into main Oct 17, 2024
8 checks passed
@jarlsondre jarlsondre deleted the scalability-report-update-4 branch October 17, 2024 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants