-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove calls to deprecated Tensor.storage()
when using newer PyTorch versions
#1230
Remove calls to deprecated Tensor.storage()
when using newer PyTorch versions
#1230
Conversation
Result of the first try: all test run through expect for
|
merged |
The first idea for a workaround mentioned above that resulted in errors only in TO BE DISCUSSED: Are we going to stay compatible with these earlier PyTorch versions (which potentially means additional work in this PR) or do we plan to jump to torch==2.0.0 with the next Heat-release anyway? btw: this also is a problem because our AMD-CI runs with a quite old PyTorch version that could not be used anymore... |
* test_stride_and_strides * MinMaxScaler Moreover, I have introduced a check whether there are at least two nodes available for the DASO test (that was always failing for 8 processes on a single GPU...)
Thank you for the PR! |
Still a problem:
|
Decision in PR meeting: introduce version check to ensure backward compatibility |
Thank you for the PR! |
moreover: introduced decorator for DASO tests (skip if nodes < 2 or no GPUs)
Thank you for the PR! |
…at pytorch version 2)
Thank you for the PR! |
Thank you for the PR! |
Codecov Report
@@ Coverage Diff @@
## main #1230 +/- ##
==========================================
- Coverage 92.32% 91.75% -0.58%
==========================================
Files 77 77
Lines 11056 11080 +24
==========================================
- Hits 10207 10166 -41
- Misses 849 914 +65
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Current workaround works on HDFML with torch==1.12 and torch==2.0.0 execpt for the following problems:
and similar for lines 24, 33, and 34 of the same file. Problem: This is related to the float16/half data types used in DASO, in particular to the creation of a custom MPI-op |
Comment on decreased codecov: |
Thank you for the PR! |
Thank you for the PR! |
Workaround for DASO has successfully been tested on HDFML (2 nodes, 4 GPUs each) with PyTorch==1.12.0 and PyTorch==2.0.0. |
Thank you for the PR! |
Summary:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great @mrfh92 thanks a lot for fixing this. I only have one potential change in the version.py
file since we are merging into main
.
Thanks a lot!
Thank you for the PR! |
Tensor.storage()
when using newer PyTorch versions
Co-authored-by: Claudia Comito <[email protected]>
@ClaudiaComito yes, youre right... since it is not directly a bug fix, but rather a hurry-ahead-bug-fix, I would merge into main |
Thank you for the PR! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
Intended to resolve issue #1229