Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FPE in vxxxxx during runTest.exe (testxxx) for HIP on LUMI #1011

Closed
valassi opened this issue Oct 3, 2024 · 2 comments · Fixed by #1012
Closed

FPE in vxxxxx during runTest.exe (testxxx) for HIP on LUMI #1011

valassi opened this issue Oct 3, 2024 · 2 comments · Fixed by #1012
Assignees

Comments

@valassi
Copy link
Member

valassi commented Oct 3, 2024

I have rerun one final batch of large scale tests for the v1.00.00 release, including LUMI.

I now systematically get FPEs in vxxxxx in runTests.exe on LUMI.

This is most likely related to #806. In that issue I bypassed a segfault using -O2 instead of -O3. The segfault was difficult to identify precisely but there were indications that it was in vxxxxx. Initially my tests all seemed to succeed. Now after a few updates (I am not sure which ones, I thought the code was almost identical?), I systematically get many more tests failing all in vxxxxx.

I assign this to me as I have some ideas what to look for, but anyone feel free to also investigate (let me know in case please). I will release v1.00.00 with the issue anyway and mark it down as pending.

NB: this should be nicely encapsulated to debug, because the error is probably in the testxxx tests. These are executed for all physics processes, but they are completely independent of physics processes.

Details

egrep '(^Floating Point Exception|{ })' tput/logs*/log*
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0_bridge.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0_bridge.txt:DEBUG: MEK 0xc32660 processed 0 events across 2 channels { }
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0_bridge.txt:DEBUG: MEK 0x7809a0 processed 0 events across 2 channels { }
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0_common.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0_common.txt:DEBUG: MEK 0x7618d0 processed 0 events across 2 channels { }
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0_common.txt:DEBUG: MEK 0x74b3d0 processed 0 events across 2 channels { }
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0_rmbhst.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0_rmbhst.txt:DEBUG: MEK 0x119a3d0 processed 0 events across 2 channels { }
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0_rmbhst.txt:DEBUG: MEK 0xc33230 processed 0 events across 2 channels { }
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x74b3d0 processed 0 events across 2 channels { }
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x728930 processed 0 events across 2 channels { }
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd1.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x117f910 processed 0 events across 2 channels { }
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x77c170 processed 0 events across 2 channels { }
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0_bridge.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0_bridge.txt:DEBUG: MEK 0x8ec7f0 processed 0 events across 123 channels { }
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0_bridge.txt:DEBUG: MEK 0x8978e0 processed 0 events across 123 channels { }
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0_common.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0_common.txt:DEBUG: MEK 0x8d9670 processed 0 events across 123 channels { }
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0_common.txt:DEBUG: MEK 0x8c5930 processed 0 events across 123 channels { }
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0_rmbhst.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0_rmbhst.txt:DEBUG: MEK 0x8d9670 processed 0 events across 123 channels { }
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0_rmbhst.txt:DEBUG: MEK 0x8c5930 processed 0 events across 123 channels { }
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x8d9670 processed 0 events across 123 channels { }
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x8c5930 processed 0 events across 123 channels { }
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd1.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x1262600 processed 0 events across 123 channels { }
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x94e8a0 processed 0 events across 123 channels { }
tput/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0_bridge.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0_bridge.txt:DEBUG: MEK 0x75eb20 processed 0 events across 16 channels { }
tput/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0_bridge.txt:DEBUG: MEK 0x11bd0d0 processed 0 events across 16 channels { }
tput/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt:DEBUG: MEK 0xb9ace0 processed 0 events across 16 channels { }
tput/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt:DEBUG: MEK 0xc4ab30 processed 0 events across 16 channels { }
tput/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd1.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd1.txt:DEBUG: MEK 0xd82780 processed 0 events across 16 channels { }
tput/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x73e480 processed 0 events across 16 channels { }
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0_bridge.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0_bridge.txt:DEBUG: MEK 0xb882a0 processed 0 events across 3 channels { }
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0_bridge.txt:DEBUG: MEK 0x783ec0 processed 0 events across 3 channels { }
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0_common.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0_common.txt:DEBUG: MEK 0x6df940 processed 0 events across 3 channels { }
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0_common.txt:DEBUG: MEK 0x67fb00 processed 0 events across 3 channels { }
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0_rmbhst.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0_rmbhst.txt:DEBUG: MEK 0x6a5340 processed 0 events across 3 channels { }
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0_rmbhst.txt:DEBUG: MEK 0x11ac900 processed 0 events across 3 channels { }
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x6df940 processed 0 events across 3 channels { }
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x67fb00 processed 0 events across 3 channels { }
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd1.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd1.txt:DEBUG: MEK 0xd1c010 processed 0 events across 3 channels { }
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x6fc940 processed 0 events across 3 channels { }
tput/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0_bridge.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0_bridge.txt:DEBUG: MEK 0xd1fcc0 processed 0 events across 5 channels { }
tput/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0_bridge.txt:DEBUG: MEK 0xd1b3b0 processed 0 events across 5 channels { }
tput/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0.txt:DEBUG: MEK 0xb83cf0 processed 0 events across 5 channels { }
tput/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x7896a0 processed 0 events across 5 channels { }
tput/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd1.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x6e4740 processed 0 events across 5 channels { }
tput/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x7298f0 processed 0 events across 5 channels { }
tput/logs_heftggbb_mad/log_heftggbb_mad_f_inl0_hrd0.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_heftggbb_mad/log_heftggbb_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x11a9de0 processed 0 events across 4 channels { }
tput/logs_heftggbb_mad/log_heftggbb_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x11975c0 processed 0 events across 4 channels { }
tput/logs_heftggbb_mad/log_heftggbb_mad_f_inl0_hrd1.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_heftggbb_mad/log_heftggbb_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x74d7b0 processed 0 events across 4 channels { }
tput/logs_heftggbb_mad/log_heftggbb_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x729a10 processed 0 events across 4 channels { }
tput/logs_smeftggtttt_mad/log_smeftggtttt_mad_f_inl0_hrd0.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_smeftggtttt_mad/log_smeftggtttt_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x72f1d0 processed 0 events across 72 channels { }
tput/logs_smeftggtttt_mad/log_smeftggtttt_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x871370 processed 0 events across 72 channels { }
tput/logs_smeftggtttt_mad/log_smeftggtttt_mad_f_inl0_hrd1.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_smeftggtttt_mad/log_smeftggtttt_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x7ea630 processed 0 events across 72 channels { }
tput/logs_smeftggtttt_mad/log_smeftggtttt_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x6dbd10 processed 0 events across 72 channels { }
tput/logs_susyggt1t1_mad/log_susyggt1t1_mad_f_inl0_hrd0.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_susyggt1t1_mad/log_susyggt1t1_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x6f2f60 processed 0 events across 6 channels { }
tput/logs_susyggt1t1_mad/log_susyggt1t1_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x6ee280 processed 0 events across 6 channels { }
tput/logs_susyggt1t1_mad/log_susyggt1t1_mad_f_inl0_hrd1.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_susyggt1t1_mad/log_susyggt1t1_mad_f_inl0_hrd1.txt:DEBUG: MEK 0xc36d80 processed 0 events across 6 channels { }
tput/logs_susyggt1t1_mad/log_susyggt1t1_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x788210 processed 0 events across 6 channels { }
tput/logs_susyggtt_mad/log_susyggtt_mad_f_inl0_hrd0.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_susyggtt_mad/log_susyggtt_mad_f_inl0_hrd0.txt:DEBUG: MEK 0xd71c40 processed 0 events across 3 channels { }
tput/logs_susyggtt_mad/log_susyggtt_mad_f_inl0_hrd0.txt:DEBUG: MEK 0xd6e8e0 processed 0 events across 3 channels { }
tput/logs_susyggtt_mad/log_susyggtt_mad_f_inl0_hrd1.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_susyggtt_mad/log_susyggtt_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x6f6ff0 processed 0 events across 3 channels { }
tput/logs_susyggtt_mad/log_susyggtt_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x117d970 processed 0 events across 3 channels { }
@valassi valassi self-assigned this Oct 3, 2024
valassi added a commit to valassi/madgraph4gpu that referenced this issue Oct 3, 2024
… 72h) for release v1.00.00 - one new issue madgraph5#1011 (FPEs in vxxxxx for LUMI)

(NB: this was run in parallel - a posteriori I reverted itscrd90 tput logs, except for 6 curhst logs, then squashed)
(To revert the curhst logs: "git checkout 4865525 tput/logs_*curhst*")

(1) Note, I had initially done a build and test without the -hip option, with some failures

STARTED  AT Wed 02 Oct 2024 09:48:45 PM EEST
./tput/teeThroughputX.sh -mix -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean
ENDED(1) AT Wed 02 Oct 2024 10:14:30 PM EEST [Status=1]
./tput/teeThroughputX.sh -flt -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean
ENDED(2) AT Wed 02 Oct 2024 10:45:14 PM EEST [Status=0]
./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -flt -bridge -makeclean
ENDED(3) AT Wed 02 Oct 2024 10:48:26 PM EEST [Status=1]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -rmbhst
ENDED(4) AT Wed 02 Oct 2024 10:50:27 PM EEST [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -curhst
ENDED(5) AT Wed 02 Oct 2024 10:50:58 PM EEST [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -common
ENDED(6) AT Wed 02 Oct 2024 10:52:58 PM EEST [Status=0]
./tput/teeThroughputX.sh -mix -hrd -makej -susyggtt -susyggt1t1 -smeftggtttt -heftggbb -makeclean
ENDED(7) AT Wed 02 Oct 2024 11:13:57 PM EEST [Status=0]

(2) This commit is the result of the second test, where I repeated using the -hip option (./tput/allTees.sh -hip)

STARTED  AT Thu 03 Oct 2024 12:57:14 AM EEST
./tput/teeThroughputX.sh -mix -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean  -nocuda
ENDED(1) AT Thu 03 Oct 2024 01:29:36 AM EEST [Status=0]
./tput/teeThroughputX.sh -flt -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean  -nocuda
ENDED(2) AT Thu 03 Oct 2024 01:38:03 AM EEST [Status=0]
./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -flt -bridge -makeclean  -nocuda
ENDED(3) AT Thu 03 Oct 2024 01:47:01 AM EEST [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -rmbhst  -nocuda
ENDED(4) AT Thu 03 Oct 2024 01:49:00 AM EEST [Status=0]
SKIP './tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -common  -nocuda'
ENDED(5) AT Thu 03 Oct 2024 01:49:00 AM EEST [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -common  -nocuda
ENDED(6) AT Thu 03 Oct 2024 01:50:58 AM EEST [Status=0]
./tput/teeThroughputX.sh -mix -hrd -makej -susyggtt -susyggt1t1 -smeftggtttt -heftggbb -makeclean  -nocuda
ENDED(7) AT Thu 03 Oct 2024 02:00:26 AM EEST [Status=0]

NB: the results below come from an improved version of checklogs in tput/allTees.sh, from a later commit

No errors found in logs

tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x74b3d0 processed 0 events across 2 channels { }
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x728930 processed 0 events across 2 channels { }
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0_common.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0_common.txt:DEBUG: MEK 0x7618d0 processed 0 events across 2 channels { }
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0_common.txt:DEBUG: MEK 0x74b3d0 processed 0 events across 2 channels { }
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd1.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x117f910 processed 0 events across 2 channels { }
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x77c170 processed 0 events across 2 channels { }
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0_rmbhst.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0_rmbhst.txt:DEBUG: MEK 0x119a3d0 processed 0 events across 2 channels { }
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0_rmbhst.txt:DEBUG: MEK 0xc33230 processed 0 events across 2 channels { }
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0_bridge.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0_bridge.txt:DEBUG: MEK 0xc32660 processed 0 events across 2 channels { }
tput/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0_bridge.txt:DEBUG: MEK 0x7809a0 processed 0 events across 2 channels { }
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0_common.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0_common.txt:DEBUG: MEK 0x8d9670 processed 0 events across 123 channels { }
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0_common.txt:DEBUG: MEK 0x8c5930 processed 0 events across 123 channels { }
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0_rmbhst.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0_rmbhst.txt:DEBUG: MEK 0x8d9670 processed 0 events across 123 channels { }
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0_rmbhst.txt:DEBUG: MEK 0x8c5930 processed 0 events across 123 channels { }
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0_bridge.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0_bridge.txt:DEBUG: MEK 0x8ec7f0 processed 0 events across 123 channels { }
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0_bridge.txt:DEBUG: MEK 0x8978e0 processed 0 events across 123 channels { }
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x8d9670 processed 0 events across 123 channels { }
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x8c5930 processed 0 events across 123 channels { }
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd1.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x1262600 processed 0 events across 123 channels { }
tput/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x94e8a0 processed 0 events across 123 channels { }
tput/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0_bridge.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0_bridge.txt:DEBUG: MEK 0x75eb20 processed 0 events across 16 channels { }
tput/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0_bridge.txt:DEBUG: MEK 0x11bd0d0 processed 0 events across 16 channels { }
tput/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd1.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd1.txt:DEBUG: MEK 0xd82780 processed 0 events across 16 channels { }
tput/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x73e480 processed 0 events across 16 channels { }
tput/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt:DEBUG: MEK 0xb9ace0 processed 0 events across 16 channels { }
tput/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt:DEBUG: MEK 0xc4ab30 processed 0 events across 16 channels { }
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0_rmbhst.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0_rmbhst.txt:DEBUG: MEK 0x6a5340 processed 0 events across 3 channels { }
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0_rmbhst.txt:DEBUG: MEK 0x11ac900 processed 0 events across 3 channels { }
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd1.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd1.txt:DEBUG: MEK 0xd1c010 processed 0 events across 3 channels { }
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x6fc940 processed 0 events across 3 channels { }
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0_common.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0_common.txt:DEBUG: MEK 0x6df940 processed 0 events across 3 channels { }
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0_common.txt:DEBUG: MEK 0x67fb00 processed 0 events across 3 channels { }
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0_bridge.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0_bridge.txt:DEBUG: MEK 0xb882a0 processed 0 events across 3 channels { }
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0_bridge.txt:DEBUG: MEK 0x783ec0 processed 0 events across 3 channels { }
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x6df940 processed 0 events across 3 channels { }
tput/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x67fb00 processed 0 events across 3 channels { }
tput/logs_ggtt_mad/#log_ggtt_mad_f_inl0_hrd0.txt#:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_ggtt_mad/#log_ggtt_mad_f_inl0_hrd0.txt#:DEBUG: MEK 0x6df940 processed 0 events across 3 channels { }
tput/logs_ggtt_mad/#log_ggtt_mad_f_inl0_hrd0.txt#:DEBUG: MEK 0x67fb00 processed 0 events across 3 channels { }
tput/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0.txt:DEBUG: MEK 0xb83cf0 processed 0 events across 5 channels { }
tput/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x7896a0 processed 0 events across 5 channels { }
tput/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0_bridge.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0_bridge.txt:DEBUG: MEK 0xd1fcc0 processed 0 events across 5 channels { }
tput/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0_bridge.txt:DEBUG: MEK 0xd1b3b0 processed 0 events across 5 channels { }
tput/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd1.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x6e4740 processed 0 events across 5 channels { }
tput/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x7298f0 processed 0 events across 5 channels { }
tput/logs_heftggbb_mad/log_heftggbb_mad_f_inl0_hrd0.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_heftggbb_mad/log_heftggbb_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x11a9de0 processed 0 events across 4 channels { }
tput/logs_heftggbb_mad/log_heftggbb_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x11975c0 processed 0 events across 4 channels { }
tput/logs_heftggbb_mad/log_heftggbb_mad_f_inl0_hrd1.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_heftggbb_mad/log_heftggbb_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x74d7b0 processed 0 events across 4 channels { }
tput/logs_heftggbb_mad/log_heftggbb_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x729a10 processed 0 events across 4 channels { }
tput/logs_smeftggtttt_mad/log_smeftggtttt_mad_f_inl0_hrd0.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_smeftggtttt_mad/log_smeftggtttt_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x72f1d0 processed 0 events across 72 channels { }
tput/logs_smeftggtttt_mad/log_smeftggtttt_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x871370 processed 0 events across 72 channels { }
tput/logs_smeftggtttt_mad/log_smeftggtttt_mad_f_inl0_hrd1.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_smeftggtttt_mad/log_smeftggtttt_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x7ea630 processed 0 events across 72 channels { }
tput/logs_smeftggtttt_mad/log_smeftggtttt_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x6dbd10 processed 0 events across 72 channels { }
tput/logs_susyggt1t1_mad/log_susyggt1t1_mad_f_inl0_hrd0.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_susyggt1t1_mad/log_susyggt1t1_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x6f2f60 processed 0 events across 6 channels { }
tput/logs_susyggt1t1_mad/log_susyggt1t1_mad_f_inl0_hrd0.txt:DEBUG: MEK 0x6ee280 processed 0 events across 6 channels { }
tput/logs_susyggt1t1_mad/log_susyggt1t1_mad_f_inl0_hrd1.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_susyggt1t1_mad/log_susyggt1t1_mad_f_inl0_hrd1.txt:DEBUG: MEK 0xc36d80 processed 0 events across 6 channels { }
tput/logs_susyggt1t1_mad/log_susyggt1t1_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x788210 processed 0 events across 6 channels { }
tput/logs_susyggtt_mad/log_susyggtt_mad_f_inl0_hrd0.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_susyggtt_mad/log_susyggtt_mad_f_inl0_hrd0.txt:DEBUG: MEK 0xd71c40 processed 0 events across 3 channels { }
tput/logs_susyggtt_mad/log_susyggtt_mad_f_inl0_hrd0.txt:DEBUG: MEK 0xd6e8e0 processed 0 events across 3 channels { }
tput/logs_susyggtt_mad/log_susyggtt_mad_f_inl0_hrd1.txt:Floating Point Exception (GPU): 'vxxxxx' ievt=17
tput/logs_susyggtt_mad/log_susyggtt_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x6f6ff0 processed 0 events across 3 channels { }
tput/logs_susyggtt_mad/log_susyggtt_mad_f_inl0_hrd1.txt:DEBUG: MEK 0x117d970 processed 0 events across 3 channels { }

eemumu MEK (channelid array) processed 512 events across 2 channels { 1 : 256, 2 : 256 }
eemumu MEK (no multichannel) processed 512 events across 2 channels { no-multichannel : 512 }
ggttggg MEK (channelid array) processed 512 events across 1240 channels { 1 : 32, 2 : 32, 4 : 32, 5 : 32, 7 : 32, 8 : 32, 14 : 32, 15 : 32, 16 : 32, 18 : 32, 19 : 32, 20 : 32, 22 : 32, 23 : 32, 24 : 32, 26 : 32 }
ggttggg MEK (no multichannel) processed 512 events across 1240 channels { no-multichannel : 512 }
ggttgg MEK (channelid array) processed 512 events across 123 channels { 2 : 32, 3 : 32, 4 : 32, 5 : 32, 6 : 32, 7 : 32, 8 : 32, 9 : 32, 10 : 32, 11 : 32, 12 : 32, 13 : 32, 14 : 32, 15 : 32, 16 : 32, 17 : 32 }
ggttgg MEK (no multichannel) processed 512 events across 123 channels { no-multichannel : 512 }
ggttg MEK (channelid array) processed 512 events across 16 channels { 1 : 64, 2 : 32, 3 : 32, 4 : 32, 5 : 32, 6 : 32, 7 : 32, 8 : 32, 9 : 32, 10 : 32, 11 : 32, 12 : 32, 13 : 32, 14 : 32, 15 : 32 }
ggttg MEK (no multichannel) processed 512 events across 16 channels { no-multichannel : 512 }
ggtt MEK (channelid array) processed 512 events across 3 channels { 1 : 192, 2 : 160, 3 : 160 }
ggtt MEK (no multichannel) processed 512 events across 3 channels { no-multichannel : 512 }
gqttq MEK (channelid array) processed 512 events across 5 channels { 1 : 128, 2 : 96, 3 : 96, 4 : 96, 5 : 96 }
gqttq MEK (no multichannel) processed 512 events across 5 channels { no-multichannel : 512 }
heftggbb MEK (channelid array) processed 512 events across 4 channels { 1 : 128, 2 : 128, 3 : 128, 4 : 128 }
heftggbb MEK (no multichannel) processed 512 events across 4 channels { no-multichannel : 512 }
smeftggtttt MEK (channelid array) processed 512 events across 72 channels { 1 : 32, 2 : 32, 3 : 32, 4 : 32, 5 : 32, 6 : 32, 7 : 32, 8 : 32, 9 : 32, 10 : 32, 11 : 32, 12 : 32, 13 : 32, 14 : 32, 15 : 32, 16 : 32 }
smeftggtttt MEK (no multichannel) processed 512 events across 72 channels { no-multichannel : 512 }
susyggt1t1 MEK (channelid array) processed 512 events across 6 channels { 2 : 128, 3 : 96, 4 : 96, 5 : 96, 6 : 96 }
susyggt1t1 MEK (no multichannel) processed 512 events across 6 channels { no-multichannel : 512 }
susyggtt MEK (channelid array) processed 512 events across 3 channels { 1 : 192, 2 : 160, 3 : 160 }
susyggtt MEK (no multichannel) processed 512 events across 3 channels { no-multichannel : 512 }
valassi added a commit to valassi/madgraph4gpu that referenced this issue Oct 3, 2024
valassi added a commit to valassi/madgraph4gpu that referenced this issue Oct 3, 2024
…rd90

Revert "[install] rerun 30 tmad tests on LUMI worker node (small-g 72h) for release v1.00.00 - all as expected (heft fails madgraph5#833, skip ggttggg madgraph5#933)"
This reverts commit a6c94d0.

Revert "[install] rerun 96 tput builds and tests on LUMI worker node (small-g 72h) for release v1.00.00 - one new issue madgraph5#1011 (FPEs in vxxxxx for LUMI)"
This reverts commit 217368c.
@valassi
Copy link
Member Author

valassi commented Oct 3, 2024

While I have a LUMI environment up and running, a few observations

I added -g to CXXFLAGS and then

make -j -f cudacpp.mk BACKEND=hip FPTYPE=f
gdb ./runTest_hip.exe 
...
(gdb) run
...
[New Thread 0x15554a3ff700 (LWP 86699)]
[New Thread 0x155449fff700 (LWP 86700)]
[Thread 0x155449fff700 (LWP 86700) exited]
INFO: The following Floating Point Exceptions will cause SIGFPE program aborts: FE_DIVBYZERO, FE_INVALID, FE_OVERFLOW
[==========] Running 4 tests from 4 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_GPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_GPU_XXX.testxxx

Thread 1 "runTest_hip.exe" received signal SIGFPE, Arithmetic exception.
0x000000000044235b in void mg5amcGpu::vxxxxx<mg5amcGpu::KernelAccessMomenta<false>, mg5amcGpu::KernelAccessWavefunctions<false> >(float const*, float, int, int, float*, int) ()
Missing separate debuginfos, use: zypper install comgr-debuginfo-2.6.0.60003-sles154.131.x86_64 hip-runtime-amd-debuginfo-6.0.32831.60003-sles154.131.x86_64 hsa-rocr-debuginfo-1.12.0.60003-sles154.131.x86_64 libdrm2-debuginfo-2.4.114-150500.3.2.x86_64 libdrm_amdgpu1-debuginfo-2.4.114-150500.3.2.x86_64 libelf1-debuginfo-0.185-150400.5.3.1.x86_64 libgcc_s1-debuginfo-13.2.1+git7813-150000.1.6.1.x86_64 libgfortran5-debuginfo-13.2.1+git7813-150000.1.6.1.x86_64 libncurses6-debuginfo-6.1-150000.5.20.1.x86_64 libnuma1-debuginfo-2.0.14.20.g4ee5e0c-150400.1.24.x86_64 libstdc++6-debuginfo-13.2.1+git7813-150000.1.6.1.x86_64 libz1-debuginfo-1.2.13-150500.4.3.1.x86_64 libzstd1-debuginfo-1.5.0-150400.3.3.1.x86_64
(gdb) where
#0  0x000000000044235b in void mg5amcGpu::vxxxxx<mg5amcGpu::KernelAccessMomenta<false>, mg5amcGpu::KernelAccessWavefunctions<false> >(float const*, float, int, int, float*, int) ()
#1  0x000000000043d74a in SIGMA_SM_GG_TTX_GPU_XXX_testxxx_Test::TestBody() ()
#2  0x000000000047e42d in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ()
#3  0x000000000046e9e6 in testing::Test::Run() ()
#4  0x000000000046eb75 in testing::TestInfo::Run() ()
#5  0x000000000046ec72 in testing::TestSuite::Run() ()
#6  0x0000000000473479 in testing::internal::UnitTestImpl::RunAllTests() ()
#7  0x000000000046ee1d in testing::UnitTest::Run() ()
#8  0x00000000004418f2 in main ()

@valassi
Copy link
Member Author

valassi commented Oct 3, 2024

Very strange, but this workaround seems to solve it

-        const fptype emp = pvec0 / ( vmass * pp );
+        //printf( "DEBUG1011 (before emp): pvec0=%f vmass=%f pp=%f vmass*pp=%f\n", pvec0, vmass, pp, vmass * pp );
+        //const fptype emp = pvec / ( vmass * pp ); // this may give a FPE #1011 (why?! maybe when vmass=+-epsilon?)
+        const fptype emp = pvec0 / vmass / pp; // workaround for FPE #1011
+        //printf( "DEBUG1011 (after emp): emp=%f\n", emp );

Will target a later release v1.00.01

valassi added a commit to valassi/madgraph4gpu that referenced this issue Oct 3, 2024
…P: replace "pvec0 / ( vmass * pp )" by "pvec0 / vmass / pp"
valassi added a commit to valassi/madgraph4gpu that referenced this issue Oct 3, 2024
…P: replace "pvec0 / ( vmass * pp )" by "pvec0 / vmass / pp"
valassi added a commit to valassi/madgraph4gpu that referenced this issue Oct 3, 2024
valassi added a commit to valassi/madgraph4gpu that referenced this issue Oct 4, 2024
…vxxxxx on HIP: replace "pvec0 / ( vmass * pp )" by "pvec0 / vmass / pp"
valassi added a commit to valassi/madgraph4gpu that referenced this issue Oct 4, 2024
valassi added a commit to valassi/madgraph4gpu that referenced this issue Oct 4, 2024
valassi added a commit to valassi/madgraph4gpu that referenced this issue Oct 4, 2024
…) with the workaround for HIP FPEs madgraph5#1011 - now all tests succeed

./tput/allTees.sh -hip

STARTED  AT Fri 04 Oct 2024 09:31:32 AM EEST
./tput/teeThroughputX.sh -mix -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean  -nocuda
ENDED(1) AT Fri 04 Oct 2024 10:33:14 AM EEST [Status=0]
./tput/teeThroughputX.sh -flt -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean  -nocuda
ENDED(2) AT Fri 04 Oct 2024 11:09:17 AM EEST [Status=0]
./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -flt -bridge -makeclean  -nocuda
ENDED(3) AT Fri 04 Oct 2024 11:17:27 AM EEST [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -rmbhst  -nocuda
ENDED(4) AT Fri 04 Oct 2024 11:19:15 AM EEST [Status=0]
SKIP './tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -common  -nocuda'
ENDED(5) AT Fri 04 Oct 2024 11:19:15 AM EEST [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -common  -nocuda
ENDED(6) AT Fri 04 Oct 2024 11:21:02 AM EEST [Status=0]
./tput/teeThroughputX.sh -mix -hrd -makej -susyggtt -susyggt1t1 -smeftggtttt -heftggbb -makeclean  -nocuda
ENDED(7) AT Fri 04 Oct 2024 11:53:25 AM EEST [Status=0]

No errors found in logs

No FPEs or '{ }' found in logs

eemumu MEK (channelid array) processed 512 events across 2 channels { 1 : 256, 2 : 256 }
eemumu MEK (no multichannel) processed 512 events across 2 channels { no-multichannel : 512 }
ggttggg MEK (channelid array) processed 512 events across 1240 channels { 1 : 32, 2 : 32, 4 : 32, 5 : 32, 7 : 32, 8 : 32, 14 : 32, 15 : 32, 16 : 32, 18 : 32, 19 : 32, 20 : 32, 22 : 32, 23 : 32, 24 : 32, 26 : 32 }
ggttggg MEK (no multichannel) processed 512 events across 1240 channels { no-multichannel : 512 }
ggttgg MEK (channelid array) processed 512 events across 123 channels { 2 : 32, 3 : 32, 4 : 32, 5 : 32, 6 : 32, 7 : 32, 8 : 32, 9 : 32, 10 : 32, 11 : 32, 12 : 32, 13 : 32, 14 : 32, 15 : 32, 16 : 32, 17 : 32 }
ggttgg MEK (no multichannel) processed 512 events across 123 channels { no-multichannel : 512 }
ggttg MEK (channelid array) processed 512 events across 16 channels { 1 : 64, 2 : 32, 3 : 32, 4 : 32, 5 : 32, 6 : 32, 7 : 32, 8 : 32, 9 : 32, 10 : 32, 11 : 32, 12 : 32, 13 : 32, 14 : 32, 15 : 32 }
ggttg MEK (no multichannel) processed 512 events across 16 channels { no-multichannel : 512 }
ggtt MEK (channelid array) processed 512 events across 3 channels { 1 : 192, 2 : 160, 3 : 160 }
ggtt MEK (no multichannel) processed 512 events across 3 channels { no-multichannel : 512 }
gqttq MEK (channelid array) processed 512 events across 5 channels { 1 : 128, 2 : 96, 3 : 96, 4 : 96, 5 : 96 }
gqttq MEK (no multichannel) processed 512 events across 5 channels { no-multichannel : 512 }
heftggbb MEK (channelid array) processed 512 events across 4 channels { 1 : 128, 2 : 128, 3 : 128, 4 : 128 }
heftggbb MEK (no multichannel) processed 512 events across 4 channels { no-multichannel : 512 }
smeftggtttt MEK (channelid array) processed 512 events across 72 channels { 1 : 32, 2 : 32, 3 : 32, 4 : 32, 5 : 32, 6 : 32, 7 : 32, 8 : 32, 9 : 32, 10 : 32, 11 : 32, 12 : 32, 13 : 32, 14 : 32, 15 : 32, 16 : 32 }
smeftggtttt MEK (no multichannel) processed 512 events across 72 channels { no-multichannel : 512 }
susyggt1t1 MEK (channelid array) processed 512 events across 6 channels { 2 : 128, 3 : 96, 4 : 96, 5 : 96, 6 : 96 }
susyggt1t1 MEK (no multichannel) processed 512 events across 6 channels { no-multichannel : 512 }
susyggtt MEK (channelid array) processed 512 events across 3 channels { 1 : 192, 2 : 160, 3 : 160 }
susyggtt MEK (no multichannel) processed 512 events across 3 channels { no-multichannel : 512 }
valassi added a commit to valassi/madgraph4gpu that referenced this issue Oct 4, 2024
Revert "[amd] rerun 30 tmad tests on LUMI worker node (small-g 72h) - no change (heft fails madgraph5#833, skip ggttggg madgraph5#933)"
This reverts commit 07c2a53.

Revert "[amd] rerun 96 tput builds and tests on LUMI worker node (small-g 72h) with the workaround for HIP FPEs madgraph5#1011 - now all tests succeed"
This reverts commit 0ec8c1c.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant