Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Pytorch 2.1 #1271

Closed
wants to merge 1 commit into from
Closed

Conversation

ClaudiaComito
Copy link
Contributor

@ClaudiaComito ClaudiaComito commented Nov 23, 2023

Due Diligence

  • General:
    • base branch must be main for new features, latest release branch (e.g. release/1.3.x) for bug fixes
    • title of the PR is suitable to appear in the Release Notes
  • Implementation:
    • unit tests: all split configurations tested
    • unit tests: multiple dtypes tested
    • documentation updated where needed

Description

  • Pipeline tests not relevant because they still run on the old (Pytorch 2.0) Docker image.
  • Tested on HDFML cluster, Stage 2024, PyTorch 2.1., 2 nodes, 4 processes per node:
    • CPU tests: OK
    • GPU tests on V100: failure in heat.sparse.tests.test_arithmetics.TestArithmetics.test_mul
  ======================================================================
FAIL: test_mul (heat.sparse.tests.test_arithmetics.TestArithmetics.test_mul)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/p/scratch/haf/comito1/devel/heat/heat/sparse/tests/test_arithmetics.py", line 750, in test_mul
    self.assertTrue(
AssertionError: tensor(False, device='cuda:0') is not true

where the index pointer of the multiplication isn't what it should. This fails on 1 process (non-distributed) as well. Documented in #1273. Tagging @Mystic-Slice though I actually think this is a PyTorch bug.

Issue/s resolved: #

Changes proposed:

Type of change

Memory requirements

NA

Performance

NA

Does this change modify the behaviour of other functions? If so, which?

no

@ghost
Copy link

ghost commented Nov 23, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

Copy link
Contributor

Thank you for the PR!

Copy link

codecov bot commented Nov 23, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (94cd067) 92.17% compared to head (18ffbc6) 92.17%.

Additional details and impacted files
@@              Coverage Diff               @@
##           release/1.3.x    #1271   +/-   ##
==============================================
  Coverage          92.17%   92.17%           
==============================================
  Files                 75       75           
  Lines              10705    10705           
==============================================
  Hits                9867     9867           
  Misses               838      838           
Flag Coverage Δ
unit 92.17% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ClaudiaComito ClaudiaComito requested review from mrfh92 and mtar November 23, 2023 06:18
@ClaudiaComito ClaudiaComito changed the title Manual setup.py update to Pytorch 2.1.1 Support Pytorch 2.1 Nov 23, 2023
@mrfh92
Copy link
Collaborator

mrfh92 commented Nov 23, 2023

I need to ask a question before reviewing: Is the purpose of this PR...

  • ...to test what actions need to be done to support PyTorch 2.1,
  • ...or to introduce also the corresponding bug-fixes,
  • ...or to do the latter and also update the CI to Torch 2.1,
  • ...or just to enable a quick-and-easy pip install of Heat on top of a existing PyTorch 2.1-software stack?

If purpose is in points 1-3: regarding the base branch I am not sure what the right workflow is: actually, since v1.3 is already released I dont know whether it is usual to include support for a newer software stack retroactively; my first idea would have been to consider support of PyTorch 2.1 as a new feature and to merge into main.

If purpose is point 4: if we just want to enable quick PyTorch 2.1 support for v1.3.1, I would just add a NotImplementedError for the corresponding routine in the sparse-module and to skip the corresponding test.

Copy link
Collaborator

@mrfh92 mrfh92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my detailed comment in the conversation

@ClaudiaComito
Copy link
Contributor Author

I need to ask a question before reviewing: Is the purpose of this PR...

* ...to test what actions need to be done to support PyTorch 2.1,

* ...or to introduce also the corresponding bug-fixes,

* ...or to do the latter and also update the CI to Torch 2.1,

* ...or just to enable a quick-and-easy pip install of Heat on top of a existing PyTorch 2.1-software stack?

If purpose is in points 1-3: regarding the base branch I am not sure what the right workflow is: actually, since v1.3 is already released I dont know whether it is usual to include support for a newer software stack retroactively; my first idea would have been to consider support of PyTorch 2.1 as a new feature and to merge into main.

If purpose is point 4: if we just want to enable quick PyTorch 2.1 support for v1.3.1, I would just add a NotImplementedError for the corresponding routine in the sparse-module and to skip the corresponding test.

Thanks @mrfh92 for taking the time to analyze the options, and apologies for being quite cryptic this morning.

My purpose was 4 - to explore the possibility of a quick merge before releasing 1.3.1. Sorry I didn't see your message until after releasing. @JuanPedroGHM and I decided it would be for the best not to merge in a hurry, as we'd be supporting torch 2.1 with the CUDA CI still hanging onto the previous version of the container.

Anyway:

I'm going to close this PR as the actual torch 2.1 support + CI adaptation will take place in the automation-generated PRs.

Thanks everybody!

@mtar mtar deleted the workflows/pytorch-update branch April 17, 2024 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants