Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing patches for PyTorch 1.12.0 w/ foss/2022a + CUDA 11.7.0 #18491

Merged

Conversation

Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Aug 8, 2023

(created using eb --new-pr)

Adds the patches from PyTorch 1.12.1 ECs similar to #18430 but for CUDA and also new ones for POWER as done for the CPU version(s) in #18490

@Flamefire Flamefire changed the title Add missing patch for PyTorch 1.12.0-CUDA Add missing patches for PyTorch 1.12.0-CUDA Aug 8, 2023
@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
taurusml20 - Linux RHEL 7.6, POWER, 8335-GTX (power9le), 6 x NVIDIA Tesla V100-SXM2-32GB, 440.64.00, Python 2.7.5
See https://gist.github.com/Flamefire/06dc4a764fb9782cdd12a125f4e73356 for a full test report.

@boegel boegel added the bug fix label Aug 8, 2023
@boegel boegel added this to the next release (4.8.1?) milestone Aug 8, 2023
@boegel
Copy link
Member

boegel commented Aug 8, 2023

@Flamefire Looks like you're running with some custom modifications?

Missing checksum for PyTorch-1.12.0.tar.gz

@Flamefire
Copy link
Contributor Author

@Flamefire Looks like you're running with some custom modifications?

Missing checksum for PyTorch-1.12.0.tar.gz

I missed that this EC for some reason still used the git-clone instead of the release archive.

But you are right that this message is a custom modification: easybuilders/easybuild-framework#4150

Otherwise the error would have been

Invalid checksum spec 'None', should be a string (MD5) or 2-tuple (type, value).

I think this is a good example why keeping a missing dict-key in the checksums an error: The unused checksum would have been missed here.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
taurusi8031 - Linux CentOS Linux 7.9.2009, x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 470.57.02, Python 2.7.5
See https://gist.github.com/Flamefire/7ef08f1fd33677a511590f897b3167e6 for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
taurusml20 - Linux RHEL 7.6, POWER, 8335-GTX (power9le), 6 x NVIDIA Tesla V100-SXM2-32GB, 440.64.00, Python 2.7.5
See https://gist.github.com/Flamefire/c9bdec43b769f937b092a6ece7678c3a for a full test report.

@casparvl
Copy link
Contributor

Test report by @casparvl
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
gcn3.local.snellius.surf.nl - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz, 4 x NVIDIA NVIDIA A100-SXM4-40GB, 520.61.05, Python 3.6.8
See https://gist.github.com/casparvl/37229105f101268f4675941d835e0e3c for a full test report.

@branfosj
Copy link
Member

Test report by @branfosj
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
bear-pg0203u29a.bear.cluster - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz (icelake), 1 x NVIDIA NVIDIA A100-SXM4-80GB, 520.61.05, Python 3.6.8
See https://gist.github.com/branfosj/f13c47d7131610cb025eab9c859e8067 for a full test report.

@branfosj
Copy link
Member

Going in, thanks @Flamefire!

@branfosj branfosj merged commit 8c81379 into easybuilders:develop Aug 12, 2023
@Flamefire Flamefire deleted the 20230808142639_new_pr_PyTorch1120 branch August 13, 2023 18:28
@boegel boegel changed the title Add missing patches for PyTorch 1.12.0-CUDA Add missing patches for PyTorch 1.12.0 w/ foss/2022a + CUDA 11.7.0 Aug 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants