Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove calls to deprecated Tensor.storage() when using newer PyTorch versions #1230

Merged
merged 14 commits into from
Oct 13, 2023

Conversation

mrfh92
Copy link
Collaborator

@mrfh92 mrfh92 commented Oct 6, 2023

Intended to resolve issue #1229

@mrfh92
Copy link
Collaborator Author

mrfh92 commented Oct 6, 2023

Result of the first try: all test run through expect for

======================================================================
FAIL: test_stride_and_strides (heat.core.tests.test_dndarray.TestDNDarray)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/hopp_fa/heat/heat/core/tests/test_dndarray.py", line 1554, in test_stride_and_strides
    self.assertEqual(heat_int16.strides, numpy_int16.strides)
AssertionError: Tuples differ: (2100, 420, 140, 35, 7, 1) != (4200, 840, 280, 70, 14, 2)

First differing element 0:
2100
4200

- (2100, 420, 140, 35, 7, 1)
+ (4200, 840, 280, 70, 14, 2)

----------------------------------------------------------------------
Ran 428 tests in 47.663s

FAILED (failures=1, errors=2, skipped=5)
ok

@ghost
Copy link

ghost commented Oct 6, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

@mrfh92
Copy link
Collaborator Author

mrfh92 commented Oct 6, 2023

merged main (although this is labeled as "bug"-PR) because it is actually not a bug-fix but rather a reaction to some future development in PyTorch and therefore this PR can wait til next release to become "proper" part of Heat

@mrfh92
Copy link
Collaborator Author

mrfh92 commented Oct 6, 2023

The first idea for a workaround mentioned above that resulted in errors only in test_stride_and_strides does not seem to work for certain earlier PyTorch versions. I have tested with torch==2.0.0 on my worksation (result above), but torch==1.8.1 (also on my workstation) and torch==1.12.0 (on HDFML) which resulted in multiple errors over many unittests. When I use torch==2.0.0 on HDFML, I only get errors in test_stride_and_strides again...

TO BE DISCUSSED: Are we going to stay compatible with these earlier PyTorch versions (which potentially means additional work in this PR) or do we plan to jump to torch==2.0.0 with the next Heat-release anyway?

btw: this also is a problem because our AMD-CI runs with a quite old PyTorch version that could not be used anymore...

@mrfh92 mrfh92 added interoperability To be discussed Requires discussion in project meeting first bug Something isn't working types labels Oct 6, 2023
* test_stride_and_strides
* MinMaxScaler

Moreover, I have introduced a check whether there are at least two nodes available for the DASO test (that was always failing for 8 processes on a single GPU...)
@github-actions
Copy link
Contributor

github-actions bot commented Oct 6, 2023

Thank you for the PR!

@mrfh92
Copy link
Collaborator Author

mrfh92 commented Oct 6, 2023

Still a problem:

/p/project/haf/users/hoppe6/heat/heat/optim/dp_optimizer.py:23: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  tens_a = torch.HalfTensor().set_(torch.HalfStorage.from_buffer(buffer_a, "native"))

@mrfh92 mrfh92 self-assigned this Oct 9, 2023
@mrfh92
Copy link
Collaborator Author

mrfh92 commented Oct 9, 2023

Decision in PR meeting: introduce version check to ensure backward compatibility

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

Thank you for the PR!

moreover: introduced decorator for DASO tests (skip if nodes < 2 or no GPUs)
@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

Thank you for the PR!

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

Thank you for the PR!

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

Thank you for the PR!

@codecov
Copy link

codecov bot commented Oct 9, 2023

Codecov Report

Merging #1230 (94e2140) into main (a32efdb) will decrease coverage by 0.58%.
The diff coverage is 32.35%.

@@            Coverage Diff             @@
##             main    #1230      +/-   ##
==========================================
- Coverage   92.32%   91.75%   -0.58%     
==========================================
  Files          77       77              
  Lines       11056    11080      +24     
==========================================
- Hits        10207    10166      -41     
- Misses        849      914      +65     
Flag Coverage Δ
unit 91.75% <32.35%> (-0.58%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
heat/core/factories.py 97.10% <ø> (ø)
heat/core/dndarray.py 96.97% <50.00%> (-0.24%) ⬇️
heat/core/memory.py 91.17% <50.00%> (-5.60%) ⬇️
heat/core/manipulations.py 98.61% <50.00%> (-0.22%) ⬇️
heat/core/communication.py 95.28% <50.00%> (-0.66%) ⬇️
heat/optim/dp_optimizer.py 13.43% <0.00%> (-10.93%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@mrfh92
Copy link
Collaborator Author

mrfh92 commented Oct 9, 2023

Current workaround works on HDFML with torch==1.12 and torch==2.0.0 execpt for the following problems:

/p/project/haf/users/hoppe6/heat/heat/optim/dp_optimizer.py:23: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  tens_a = torch.HalfTensor().set_(torch.HalfStorage.from_buffer(buffer_a, "native"))

and similar for lines 24, 33, and 34 of the same file.

Problem: This is related to the float16/half data types used in DASO, in particular to the creation of a custom MPI-op

@mrfh92
Copy link
Collaborator Author

mrfh92 commented Oct 9, 2023

Comment on decreased codecov:
This is mainly due to the fact that I now skipp all DASO-related tests as long as the hardware is inappropriate (i.e., no GPUs or less than 2 nodes), which is the case for our CI-runners. Hence, DASO is not covered anymore.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

Thank you for the PR!

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

Thank you for the PR!

@mrfh92
Copy link
Collaborator Author

mrfh92 commented Oct 9, 2023

Workaround for DASO has successfully been tested on HDFML (2 nodes, 4 GPUs each) with PyTorch==1.12.0 and PyTorch==2.0.0.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

Thank you for the PR!

@mrfh92
Copy link
Collaborator Author

mrfh92 commented Oct 9, 2023

Summary:

  • I have replaced typed storage by untyped storage wherever the deprecation warning appeared. To ensure backward compatibility (untyped storage seems to have been introduced in PyTorch==2.0.0), I use try/except (in the code) and version-checks in the unittests.
  • DASO-tests are now completely skipped whenever the hardware is inappropriate (no GPUs, less than 2 nodes). Thus, codecov is reduced compared to main, where e.g. wrong input data types for DASO have been tested even on a single CPU-process.
  • a tolerance in the preprocessing tests has been adapted to avoid occasional random failure due to too tight tolerances

@mrfh92 mrfh92 marked this pull request as ready for review October 9, 2023 15:17
@mrfh92 mrfh92 requested review from ClaudiaComito and mtar October 9, 2023 16:52
heat/core/version.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ClaudiaComito ClaudiaComito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great @mrfh92 thanks a lot for fixing this. I only have one potential change in the version.py file since we are merging into main.

Thanks a lot!

heat/core/version.py Outdated Show resolved Hide resolved
@github-actions
Copy link
Contributor

Thank you for the PR!

@ClaudiaComito ClaudiaComito changed the title Bugs/1229 bug user warning typed storage is deprecated Remove calls to deprecated Tensor.storage() when using newer PyTorch versions Oct 13, 2023
Co-authored-by: Claudia Comito <[email protected]>
@mrfh92
Copy link
Collaborator Author

mrfh92 commented Oct 13, 2023

@ClaudiaComito yes, youre right... since it is not directly a bug fix, but rather a hurry-ahead-bug-fix, I would merge into main

@github-actions
Copy link
Contributor

Thank you for the PR!

Copy link
Contributor

@ClaudiaComito ClaudiaComito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@mrfh92 mrfh92 merged commit 5beb254 into main Oct 13, 2023
@mrfh92 mrfh92 deleted the bugs/1229-_Bug_UserWarning_TypedStorage_is_deprecated branch October 13, 2023 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working interoperability merge queue To be discussed Requires discussion in project meeting first types
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants