Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes in xxxxx for IEEE_DIVIDE_BY_ZERO FPE; separate cpu/gpu namespaces and fix runtest segfault #723

Merged
merged 155 commits into from
Jul 21, 2023

Conversation

valassi
Copy link
Member

@valassi valassi commented Jul 17, 2023

This is WIP MR with comprehensive fixes in xxxxxx funxtions for FPE floating point exceptions

It is motivated by and is meant to fix bug #701

valassi added 22 commits July 17, 2023 08:45
…_ZERO (see firemodels/fds/issues/5638 on gh) with -ffpe flags

However, the build gives this warning
  ccache /cvmfs/sft.cern.ch/lcg/releases/gcc/11.2.0-ad950/x86_64-centos8/bin/g++  -O3  -std=c++17 -I. -I../../src -I../../../../../test/googletest/install/include -I../../../../../test/googletest/install/include -Wall -Wshadow -Wextra -ffast-math  -fopenmp -march=skylake-avx512 -mprefer-vector-width=256  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -ffpe-trap=invalid,zero,overflow -ffpe-summary=none  -fPIC -c testxxx.cc -o testxxx.o
  cc1plus: warning: command-line option ‘-ffpe-trap=invalid,zero,overflow’ is valid for Fortran but not for C++
  cc1plus: warning: command-line option ‘-ffpe-summary=none’ is valid for Fortran but not for C++
I will revert
Revert "[fpe] in ggttsa cudacpp.mk, try to debug madgraph5#701 IEEE_DIVIDE_BY_ZERO (see firemodels/fds/issues/5638 on gh) with -ffpe flags"
This reverts commit d75e426.
…als to debug madgraph5#701 (see https://stackoverflow.com/a/17473528)

This works as expected:
  [avalassi@itscrd80 gcc11.2/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.sa/SubProcesses/P1_Sigma_sm_gg_ttx> ./runTest.exe --gtest_filter=*xxx
  Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc
  Note: Google Test filter = *xxx
  [==========] Running 2 tests from 2 test suites.
  [----------] Global test environment set-up.
  [----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
  [ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
  Floating point exception (core dumped)
…signal handler for madgraph5#701

  [avalassi@itscrd80 gcc11.2/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.sa/SubProcesses/P1_Sigma_sm_gg_ttx> make -j AVX=512y
  ...
  [avalassi@itscrd80 gcc11.2/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.sa/SubProcesses/P1_Sigma_sm_gg_ttx> ./runTest.exe --gtest_filter=*xxx
  Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc
  Note: Google Test filter = *xxx
  [==========] Running 2 tests from 2 test suites.
  [----------] Global test environment set-up.
  [----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
  [ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
  Floating Point Exception (CPU neppV=4): 'ipzxxx'
…CPP_RUNTIME_DISABLEFPE is set

Note: as observed last week, a debug build triggers an FPE exception already in ixxxxx

[avalassi@itscrd80 gcc11.2/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.sa/SubProcesses/P1_Sigma_sm_gg_ttx> ./runTest.exe
Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc
[==========] Running 3 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
Floating Point Exception (CPU neppV=4): 'ixxxxx'

Conversely, in the same debug build, disabling FPEs with the env variable gives a successful test

[avalassi@itscrd80 gcc11.2/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.sa/SubProcesses/P1_Sigma_sm_gg_ttx> CUDACPP_RUNTIME_DISABLEFPE=1 ./runTest.exe
Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc
[==========] Running 3 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
[       OK ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx (0 ms)
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX (0 ms total)

[----------] 1 test from SIGMA_SM_GG_TTX_CPU_MISC
[ RUN      ] SIGMA_SM_GG_TTX_CPU_MISC.testmisc
[       OK ] SIGMA_SM_GG_TTX_CPU_MISC.testmisc (0 ms)
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_MISC (0 ms total)

[----------] 1 test from SIGMA_SM_GG_TTX_CPU/MadgraphTest
[ RUN      ] SIGMA_SM_GG_TTX_CPU/MadgraphTest.CompareMomentaAndME/0
INFO: Opening reference file ../../test/ref/dump_CPUTest.Sigma_sm_gg_ttx.txt
INFO: The application is built for skylake-avx512 (AVX512VL) and the host supports it
INFO: The application is built for skylake-avx512 (AVX512VL) and the host supports it
[       OK ] SIGMA_SM_GG_TTX_CPU/MadgraphTest.CompareMomentaAndME/0 (34 ms)
[----------] 1 test from SIGMA_SM_GG_TTX_CPU/MadgraphTest (34 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 3 test suites ran. (35 ms total)
[  PASSED  ] 3 tests.
No change in runTest behaviour, FPEs by default, succeeds if FPEs disabled
…et cast)

No change in runTest behaviour, FPEs by default, succeeds if FPEs disabled
…handler).

This also includes a resetHstMomentaToPar0, which is commented out for the moment.
The idea was to modify the momenta befaore each xxx call, to ensure that they are all consistent.
But I will instead implement a more solid fix.

No change in runTest behaviour, FPEs by default, succeeds if FPEs disabled
In debug mode this fails like this

[==========] Running 3 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
nsp=-1 ievt=0: 500, 0, 0, 500,
IXXXXX: sqp0p3={ -0, -0, -0, -0 }
Floating Point Exception (CPU neppV=4): 'ixxxxx' ievt=0

Note: last week the sqp0p3 were not all 0. I am not sure what I was doing (I was using hstReset?).
Anyway: I will revert this commit an dthe previous one. We need a much more solid fix in all xxx functions.
…l start from scratch

Revert "[fpe] in ggtt.sa HelAmps_sm.h, add some debugging printouts for ixxxxx"
This reverts commit fdacc5e

Revert "[fpe] in ggtt.sa HelAmps_sm.h, first (OLD!) attempt of BUG FIX FOR madgraph5#701 in function ixxxxx"
This reverts commit 7674824.
The build fails because maskand is also defined in testmisc.cc
Thiw now shows (in debug builds) that the first tests executed is ixxxxx and it immediately fails with FPE

[==========] Running 3 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx

nsp=-1 ievt=0: 500, 0, 0, 500,
Prepare test ixxxxx ievt=0
Floating Point Exception (CPU neppV=4): 'ixxxxx' ievt=0
…ptype& r )" to create cx vectors from fp scalars
…ion ixxxxx

This builds and runs ok. The FPE (always in debug mode) is now moved from ixxxxx to the next ipzxxx

[==========] Running 3 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
nsp=-1 ievt=0: 500, 0, 0, 500,
Prepare test ixxxxx ievt=0
Prepare test ipzxxx ievt=0
Floating Point Exception (CPU neppV=4): 'ipzxxx' ievt=0
…ginning of each test (prepare to modify momenta for ipzxxx)

No change in runTest behaviour, FPEs by default in ipzxxx, succeeds if FPEs disabled
…respecting the relevant assumptions

Assumption example for ipzxxx: (FMASS == 0) and (PX == PY == 0 and E == +PZ > 0)

This is done by testing one ievt and copying all momenta to that ievt

NB: after adding the woraround for ipzxxx, now the test fails in vxxxxx, which is the real issue in madgraph5#701
[==========] Running 3 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
nsp=-1 ievt=0: 500, 0, 0, 500,
Prepare test ixxxxx ievt=0
Prepare test ipzxxx ievt=0
Prepare test vxxxxx ievt=0
Floating Point Exception (CPU neppV=4): 'vxxxxx' ievt=0
…ion vxxxxx

This builds and runs ok. The FPE (always in debug mode) is now moved from vxxxxx to the next oxxxxx

Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc
[==========] Running 3 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
nsp=-1 ievt=0: 500, 0, 0, 500,
Prepare test ixxxxx ievt=0
Prepare test ipzxxx ievt=0
Prepare test vxxxxx ievt=0
Prepare test sxxxxx ievt=0
Prepare test oxxxxx ievt=0
Floating Point Exception (CPU neppV=4): 'oxxxxx' ievt=0
valassi added 4 commits July 17, 2023 16:41
NB1: This also adds LIBFLAGS to link command for shared libraries
This is needed to avoid "hidden symbol `__gcov_init' in ...libgcov.a(_gcov.o) is referenced by DSO" errors

NB2: I will not add a gcov target to .mad makefiles (they have no debug target either yet)
…make clean'

Revert "[fpe] in ggt.sa .gitignore, add gcov suffixes to gitignore"
This reverts commit eb5594d.
valassi added 3 commits July 20, 2023 17:33
…rocesses

for f in `gitls */SubProcesses/MemoryAccessDenominators.h`; do \cp gg_tt.mad/SubProcesses/MemoryAccessDenominators.h $f; done
for f in `gitls */SubProcesses/MemoryAccessNumerators.h`; do \cp gg_tt.mad/SubProcesses/MemoryAccessNumerators.h $f; done
valassi added 2 commits July 21, 2023 11:23
Note: the performance is very similar to that of upstream/master.
Maybe only the simplest 2->2 processes are a bit slower, but that's acceptable.
The number of SIMD instructions has changed, but not in all builds, which is a bit surprising.
All in all, things look ok.

STARTED  AT Thu Jul 20 18:19:00 CEST 2023
./tput/teeThroughputX.sh -mix -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean
ENDED(1) AT Thu Jul 20 21:22:53 CEST 2023 [Status=0]
./tput/teeThroughputX.sh -flt -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean
ENDED(2) AT Thu Jul 20 21:48:39 CEST 2023 [Status=0]
./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -flt -bridge -makeclean
ENDED(3) AT Thu Jul 20 21:58:10 CEST 2023 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -rmbhst
ENDED(4) AT Thu Jul 20 22:01:12 CEST 2023 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -curhst
ENDED(5) AT Thu Jul 20 22:04:11 CEST 2023 [Status=0]
Note: performance remains very similar to upstream/master

STARTED AT Thu Jul 20 22:07:15 CEST 2023
ENDED   AT Fri Jul 21 02:16:03 CEST 2023

Status=0

24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_m_inl0_hrd0.txt
1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_d_inl0_hrd0.txt
1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt
1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_m_inl0_hrd0.txt
0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt
valassi added 3 commits July 21, 2023 15:05
… easier merging

This ~completes the fpe and namespace patches, addressing madgraph5#701 and madgraph5#725, respectively.
(HOWEVER, the CI on MacOS failed for this with madgraph5#730 - still a few things to change before merging).

Unfortunately, I tested that this patch only fixes the IEEE_DIVIDE_BY_ZERO part of madgraph5#701,
but there are still other issues remaining (being debugged in branch nobm).

Revert "[fpe] rerun 15 tmad - ggttgg tests fail again madgraph5#655 as expected"
This reverts commit 9212960.

Revert "[fpe] rerun 78 tput alltees, all ok"
This reverts commit 9a68868.
valassi added 5 commits July 21, 2023 15:40
…esses towards src - this fixes HRDCOD=1 builds on non-SM processes madgraph5#731
…da of non-SM) to CODEGEN from heft_gg_h.sa
…madgraph5#730 and madgraph5#731

This completes the fpe and namespace patches, addressing madgraph5#701 and madgraph5#725, respectively.

Unfortunately, I tested that this patch only fixes the IEEE_DIVIDE_BY_ZERO part of madgraph5#701,
but there are still other issues remaining (being debugged in branch nobm and in madgraph5#733):
  IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
@valassi
Copy link
Member Author

valassi commented Jul 21, 2023

This is finally complete - as good as it gets - and passes the CI tests. I am self merging.

I will then document it a posteriori.

@valassi valassi changed the title WIP: fixes in xxxxx for FPEs; separate cpu/gpu namespaces and fix runtest segfault WIP: fixes in xxxxx for IEEE_DIVIDE_BY_ZERO FPE; separate cpu/gpu namespaces and fix runtest segfault Jul 21, 2023
@valassi valassi changed the title WIP: fixes in xxxxx for IEEE_DIVIDE_BY_ZERO FPE; separate cpu/gpu namespaces and fix runtest segfault Fixes in xxxxx for IEEE_DIVIDE_BY_ZERO FPE; separate cpu/gpu namespaces and fix runtest segfault Jul 21, 2023
@valassi valassi merged commit 39de7b3 into madgraph5:master Jul 21, 2023
valassi added a commit to valassi/madgraph4gpu that referenced this pull request Jul 21, 2023
(this is the merge of fpe as of commit 49f9d3f, which will be merged to master in madgraph5#723)
@valassi
Copy link
Member Author

valassi commented Jul 21, 2023

This is some documentation for this MR #723.

This is addressing two rather large/complex issues. It is fixing 8 github issues in total.

  1. Part 1 is floating point exceptions FPEs (branch fpe)
  1. Part 2 is about separating cpu/gpu namespaces and fixing debug builds (branch namespace, initially in MR WIP: separate cpu/gpu namespaces and fix runtest segmentation fault #728 which I then coalesced with this one)

I think this should be more or less all for this MR.

Next steps on this line of work would be

cc @roiser @oliviermattelaer @hageboeck @zeniheisser @Jooorgen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment