Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{2023.06}[2023a,sapphire_rapids] PyTorch 2.1.2 #882

Merged

Conversation

bedroge
Copy link
Collaborator

@bedroge bedroge commented Jan 23, 2025

Initially I was going to try to use an updated easyconfig for Z3, but looking at it again I don't think it will help. Our PyTorch is still using the Z3 with a Python suffix, so rebuilding Z3 based on easybuilders/easybuild-easyconfigs#20050 will probably not do much. Instead, I've just increased the maximum number of failed tests to 4 to work around the issue described at #875 (comment).

Copy link

eessi-bot bot commented Jan 23, 2025

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/sapphire_rapids, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-software, eessi.io-2023.06-compat

Copy link

eessi-bot bot commented Jan 23, 2025

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-software, eessi.io-2023.06-compat

@bedroge
Copy link
Collaborator Author

bedroge commented Jan 23, 2025

bot: build repo:eessi.io-2023.06-software arch:x86_64/intel/sapphire_rapids

Copy link

eessi-bot bot commented Jan 23, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/intel/sapphire_rapids from bedroge

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/intel/sapphire_rapids
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/intel/sapphire_rapids resulted in:

Copy link

eessi-bot bot commented Jan 23, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/intel/sapphire_rapids from bedroge

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/intel/sapphire_rapids
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/intel/sapphire_rapids resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Jan 23, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-intel-sapphire_rapids for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.01/pr_882/42206

date job status comment
Jan 23 15:33:42 UTC 2025 submitted job id 42206 awaits release by job manager
Jan 23 15:34:40 UTC 2025 released job awaits launch by Slurm scheduler
Jan 23 15:40:50 UTC 2025 running job 42206 is running
Jan 24 01:57:36 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-42206.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-intel-sapphire_rapids-1737683307.tar.gzsize: 141 MiB (148698861 bytes)
entries: 12727
modules under 2023.06/software/linux/x86_64/intel/sapphire_rapids/modules/all
PyTorch/2.1.2-foss-2023a.lua
Z3/4.12.2-GCCcore-12.3.0.lua
software under 2023.06/software/linux/x86_64/intel/sapphire_rapids/software
PyTorch/2.1.2-foss-2023a
Z3/4.12.2-GCCcore-12.3.0
other under 2023.06/software/linux/x86_64/intel/sapphire_rapids
2023.06/init/easybuild/eb_hooks.py
Jan 24 01:57:36 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] (1/8) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /775175bf @BotBuildTests:x86-64-intel-srapids-node+default
P: latency: 1.73 us (r:0, l:None, u:None)
[ OK ] (2/8) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /52707c40 @BotBuildTests:x86-64-intel-srapids-node+default
P: latency: 1.74 us (r:0, l:None, u:None)
[ OK ] (3/8) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /b1aacda9 @BotBuildTests:x86-64-intel-srapids-node+default
P: latency: 3.88 us (r:0, l:None, u:None)
[ OK ] (4/8) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /c6bad193 @BotBuildTests:x86-64-intel-srapids-node+default
P: latency: 4.25 us (r:0, l:None, u:None)
[ OK ] (5/8) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /15cad6c4 @BotBuildTests:x86-64-intel-srapids-node+default
P: latency: 0.42 us (r:0, l:None, u:None)
[ OK ] (6/8) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /6672deda @BotBuildTests:x86-64-intel-srapids-node+default
P: latency: 0.36 us (r:0, l:None, u:None)
[ OK ] (7/8) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /2a9a47b1 @BotBuildTests:x86-64-intel-srapids-node+default
P: bandwidth: 13515.75 MB/s (r:0, l:None, u:None)
[ OK ] (8/8) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /1b24ab8e @BotBuildTests:x86-64-intel-srapids-node+default
P: bandwidth: 13552.21 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 8/8 test case(s) from 8 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-42206.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Jan 24 10:51:16 UTC 2025 uploaded transfer of eessi-2023.06-software-linux-x86_64-intel-sapphire_rapids-1737683307.tar.gz to S3 bucket succeeded

@bedroge bedroge added ready-to-deploy Mark a PR as ready to deploy 2023.06-software.eessi.io 2023.06 version of software.eessi.io sapphire_rapids labels Jan 24, 2025
@boegel
Copy link
Contributor

boegel commented Jan 24, 2025

@bedroge Can you also update #461 + eessi-2023.06-known-issues.yml accordingly?

Maybe mention which tests are failing in the issue, so we have some reference info

@bedroge
Copy link
Collaborator Author

bedroge commented Jan 24, 2025

@bedroge Can you also update #461 + eessi-2023.06-known-issues.yml accordingly?

Maybe mention which tests are failing in the issue, so we have some reference info

Done, see a2fc9e7 and #461 (comment).

@boegel boegel added bot:deploy Ask bot to deploy missing software installations to EESSI and removed ready-to-deploy Mark a PR as ready to deploy labels Jan 24, 2025
Copy link
Contributor

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel boegel merged commit 152414b into EESSI:2023.06-software.eessi.io Jan 24, 2025
49 checks passed
Copy link

eessi-bot bot commented Jan 24, 2025

PR merged! Moved ['/project/def-users/SHARED/jobs/2025.01/pr_882/42206'] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.01.24

Copy link

eessi-bot bot commented Jan 24, 2025

PR merged! Moved [] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.01.24

@bedroge bedroge deleted the sapphire_rapids_pytorch_212 branch January 24, 2025 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2023.06-software.eessi.io 2023.06 version of software.eessi.io bot:deploy Ask bot to deploy missing software installations to EESSI sapphire_rapids
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants