New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Fixes in xxxxx for IEEE_DIVIDE_BY_ZERO FPE; separate cpu/gpu namespaces and fix runtest segfault #723

Merged

valassi merged 155 commits into madgraph5:master from valassi:fpe

Jul 21, 2023

Member

valassi commented Jul 17, 2023

This is WIP MR with comprehensive fixes in xxxxxx funxtions for FPE floating point exceptions

It is motivated by and is meant to fix bug #701

valassi added 22 commits

July 17, 2023 08:45


          [fpe] in ggtt.sa tests, add comments about how to run a single test i…

8775b2d

…n runTest.exe


          [fpe] in ggtt.sa, add copyright header when dumping new reference fil…

ea07a20

…es for testxxx


          [fpe] in ggttsa cudacpp.mk, try to debug madgraph5#701 IEEE_DIVIDE_BY…

d75e426

…_ZERO (see firemodels/fds/issues/5638 on gh) with -ffpe flags

However, the build gives this warning
  ccache /cvmfs/sft.cern.ch/lcg/releases/gcc/11.2.0-ad950/x86_64-centos8/bin/g++  -O3  -std=c++17 -I. -I../../src -I../../../../../test/googletest/install/include -I../../../../../test/googletest/install/include -Wall -Wshadow -Wextra -ffast-math  -fopenmp -march=skylake-avx512 -mprefer-vector-width=256  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -ffpe-trap=invalid,zero,overflow -ffpe-summary=none  -fPIC -c testxxx.cc -o testxxx.o
  cc1plus: warning: command-line option ‘-ffpe-trap=invalid,zero,overflow’ is valid for Fortran but not for C++
  cc1plus: warning: command-line option ‘-ffpe-summary=none’ is valid for Fortran but not for C++
I will revert


          [fpe] revert addition of -ffpe flags in ggttsa cudacpp.mk

6fc09d8

Revert "[fpe] in ggttsa cudacpp.mk, try to debug madgraph5#701 IEEE_DIVIDE_BY_ZERO (see firemodels/fds/issues/5638 on gh) with -ffpe flags"
This reverts commit d75e426.


          [fpe] in ggtt.sa testxxx.cc, enable FPE floating point exception sign…

8efb726

…als to debug madgraph5#701 (see https://stackoverflow.com/a/17473528)

This works as expected:
  [avalassi@itscrd80 gcc11.2/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.sa/SubProcesses/P1_Sigma_sm_gg_ttx> ./runTest.exe --gtest_filter=*xxx
  Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc
  Note: Google Test filter = *xxx
  [==========] Running 2 tests from 2 test suites.
  [----------] Global test environment set-up.
  [----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
  [ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
  Floating point exception (core dumped)


          [fpe] in ggtt.sa testxxx.cc, add a very simple signal handler for FPE…

22bd10d

…s to debug madgraph5#701


          [fpe] in ggtt.sa testxxx.cc, add some context information to the FPE …

f64590a

…signal handler for madgraph5#701

  [avalassi@itscrd80 gcc11.2/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.sa/SubProcesses/P1_Sigma_sm_gg_ttx> make -j AVX=512y
  ...
  [avalassi@itscrd80 gcc11.2/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.sa/SubProcesses/P1_Sigma_sm_gg_ttx> ./runTest.exe --gtest_filter=*xxx
  Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc
  Note: Google Test filter = *xxx
  [==========] Running 2 tests from 2 test suites.
  [----------] Global test environment set-up.
  [----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
  [ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
  Floating Point Exception (CPU neppV=4): 'ipzxxx'


          [fpe] in ggtt.sa testxxx.cc, disable FPE if environment variable CUDA…

b35b772

…CPP_RUNTIME_DISABLEFPE is set

Note: as observed last week, a debug build triggers an FPE exception already in ixxxxx

[avalassi@itscrd80 gcc11.2/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.sa/SubProcesses/P1_Sigma_sm_gg_ttx> ./runTest.exe
Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc
[==========] Running 3 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
Floating Point Exception (CPU neppV=4): 'ixxxxx'

Conversely, in the same debug build, disabling FPEs with the env variable gives a successful test

[avalassi@itscrd80 gcc11.2/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.sa/SubProcesses/P1_Sigma_sm_gg_ttx> CUDACPP_RUNTIME_DISABLEFPE=1 ./runTest.exe
Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc
[==========] Running 3 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
[       OK ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx (0 ms)
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX (0 ms total)

[----------] 1 test from SIGMA_SM_GG_TTX_CPU_MISC
[ RUN      ] SIGMA_SM_GG_TTX_CPU_MISC.testmisc
[       OK ] SIGMA_SM_GG_TTX_CPU_MISC.testmisc (0 ms)
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_MISC (0 ms total)

[----------] 1 test from SIGMA_SM_GG_TTX_CPU/MadgraphTest
[ RUN      ] SIGMA_SM_GG_TTX_CPU/MadgraphTest.CompareMomentaAndME/0
INFO: Opening reference file ../../test/ref/dump_CPUTest.Sigma_sm_gg_ttx.txt
INFO: The application is built for skylake-avx512 (AVX512VL) and the host supports it
INFO: The application is built for skylake-avx512 (AVX512VL) and the host supports it
[       OK ] SIGMA_SM_GG_TTX_CPU/MadgraphTest.CompareMomentaAndME/0 (34 ms)
[----------] 1 test from SIGMA_SM_GG_TTX_CPU/MadgraphTest (34 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 3 test suites ran. (35 ms total)
[  PASSED  ] 3 tests.


          [fpe] in ggtt.sa testxxx.cc, turn on event by event debugging

f2598e0

No change in runTest behaviour, FPEs by default, succeeds if FPEs disabled


          [fpe] in ggtt.sa testxxx.cc, cleanup (remove an unnecessary reinterpr…

46b6a0f

…et cast)

No change in runTest behaviour, FPEs by default, succeeds if FPEs disabled


          [fpe] in ggtt.sa testxxx.cc, add prepareTest (and improve FPE signal …

5c1470a

…handler).

This also includes a resetHstMomentaToPar0, which is commented out for the moment.
The idea was to modify the momenta befaore each xxx call, to ensure that they are all consistent.
But I will instead implement a more solid fix.

No change in runTest behaviour, FPEs by default, succeeds if FPEs disabled


          [fpe] in ggtt.sa HelAmps_sm.h, first (OLD!) attempt of BUG FIX FOR ma…

…dgraph5#701 in function ixxxxx

This builds ok


          [fpe] in ggtt.sa HelAmps_sm.h, add some debugging printouts for ixxxxx

fdacc5e

In debug mode this fails like this

[==========] Running 3 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
nsp=-1 ievt=0: 500, 0, 0, 500,
IXXXXX: sqp0p3={ -0, -0, -0, -0 }
Floating Point Exception (CPU neppV=4): 'ixxxxx' ievt=0

Note: last week the sqp0p3 were not all 0. I am not sure what I was doing (I was using hstReset?).
Anyway: I will revert this commit an dthe previous one. We need a much more solid fix in all xxx functions.


          [fpe] revert the last two changes in ggtt.sa HelAmps_sm.h ixxxxx, wil…

0372e2e

…l start from scratch

Revert "[fpe] in ggtt.sa HelAmps_sm.h, add some debugging printouts for ixxxxx"
This reverts commit fdacc5e

Revert "[fpe] in ggtt.sa HelAmps_sm.h, first (OLD!) attempt of BUG FIX FOR madgraph5#701 in function ixxxxx"
This reverts commit 7674824.


          [fpe] in ggtt.sa mgOnGpuVectors.h, add maskand function

f607906

The build fails because maskand is also defined in testmisc.cc


          [fpe] in ggtt.sa testmisc.ss, remove maskand function as it exists in…

7d1336b

… mgOnGpuVectors.h now


          [fpe] in ggtt.sa testxxx.cc, add more debugging printouts

b0fca94

Thiw now shows (in debug builds) that the first tests executed is ixxxxx and it immediately fails with FPE

[==========] Running 3 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx

nsp=-1 ievt=0: 500, 0, 0, 500,
Prepare test ixxxxx ievt=0
Floating Point Exception (CPU neppV=4): 'ixxxxx' ievt=0


          [fpe] in ggtt.sa mgOnGpuVectors.h, add constructor "cxtype_v( const f…

8745a98

…ptype& r )" to create cx vectors from fp scalars


          [fpe] in ggtt.sa HelAmps_sm.h, new BUG FIX FOR madgraph5#701 in funct…

68d787f

…ion ixxxxx

This builds and runs ok. The FPE (always in debug mode) is now moved from ixxxxx to the next ipzxxx

[==========] Running 3 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
nsp=-1 ievt=0: 500, 0, 0, 500,
Prepare test ixxxxx ievt=0
Prepare test ipzxxx ievt=0
Floating Point Exception (CPU neppV=4): 'ipzxxx' ievt=0


          [fpe] in ggtt.sa testxxx.cc, reenable resetHstMomentaToPar0 at the be…

7e91a0e

…ginning of each test (prepare to modify momenta for ipzxxx)

No change in runTest behaviour, FPEs by default in ipzxxx, succeeds if FPEs disabled


          [fpe] in ggtt.sa testxxx.cc, ensure that ipzxxx handles SIMD vectors …

de6492c

…respecting the relevant assumptions

Assumption example for ipzxxx: (FMASS == 0) and (PX == PY == 0 and E == +PZ > 0)

This is done by testing one ievt and copying all momenta to that ievt

NB: after adding the woraround for ipzxxx, now the test fails in vxxxxx, which is the real issue in madgraph5#701
[==========] Running 3 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
nsp=-1 ievt=0: 500, 0, 0, 500,
Prepare test ixxxxx ievt=0
Prepare test ipzxxx ievt=0
Prepare test vxxxxx ievt=0
Floating Point Exception (CPU neppV=4): 'vxxxxx' ievt=0


          [fpe] in ggtt.sa HelAmps_sm.h, new BUG FIX FOR madgraph5#701 in funct…

18dd262

…ion vxxxxx

This builds and runs ok. The FPE (always in debug mode) is now moved from vxxxxx to the next oxxxxx

Running main() from /data/avalassi/GPU2023/madgraph4gpuX/test/googletest/googletest/src/gtest_main.cc
[==========] Running 3 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from SIGMA_SM_GG_TTX_CPU_XXX
[ RUN      ] SIGMA_SM_GG_TTX_CPU_XXX.testxxx
nsp=-1 ievt=0: 500, 0, 0, 500,
Prepare test ixxxxx ievt=0
Prepare test ipzxxx ievt=0
Prepare test vxxxxx ievt=0
Prepare test sxxxxx ievt=0
Prepare test oxxxxx ievt=0
Floating Point Exception (CPU neppV=4): 'oxxxxx' ievt=0

valassi marked this pull request as draft

July 17, 2023 10:51

valassi self-assigned this

valassi mentioned this pull request

Four floating point exceptions in CPP launch of pp_ttW (IEEE_DIVIDE_BY_ZERO FPE in vxxxxx function in SIMD mode) #701

Closed

valassi linked an issue

that may be closed by this pull request

Four floating point exceptions in CPP launch of pp_ttW (IEEE_DIVIDE_BY_ZERO FPE in vxxxxx function in SIMD mode) #701

Closed

valassi added 4 commits

July 17, 2023 16:41


          [fpe] in ggtt.sa cudacpp makefiles, add gcov target

709ec5d

NB1: This also adds LIBFLAGS to link command for shared libraries
This is needed to avoid "hidden symbol `__gcov_init' in ...libgcov.a(_gcov.o) is referenced by DSO" errors

NB2: I will not add a gcov target to .mad makefiles (they have no debug target either yet)


          [fpe] in ggt.sa .gitignore, add gcov suffixes to gitignore

eb5594d


          [fpe] revert the previous change: will instead remove gcov files in '…

e4957d5

…make clean'

Revert "[fpe] in ggt.sa .gitignore, add gcov suffixes to gitignore"
This reverts commit eb5594d.


          [fpe] in ggtt.sa cudacpp makefiles, remove files with gcov suffixes i…

fc120fa

…n 'make clean'

valassi added 3 commits

July 20, 2023 17:33


          [fpe] manually copy the two fixed files to the other 6 mad and 7 sa p…

d01ba9d

…rocesses

for f in `gitls */SubProcesses/MemoryAccessDenominators.h`; do \cp gg_tt.mad/SubProcesses/MemoryAccessDenominators.h $f; done
for f in `gitls */SubProcesses/MemoryAccessNumerators.h`; do \cp gg_tt.mad/SubProcesses/MemoryAccessNumerators.h $f; done


          [fpe] rerun tput test for eemumu and ggtt - all looks ok

4bf1160


          [fpe] rerun tput test also for ggttg* - all looks ok

41c6a6d

valassi force-pushed the fpe branch from a6eb55f to 41c6a6d Compare

July 20, 2023 16:10

valassi added 2 commits

July 21, 2023 11:23


          [fpe] rerun 78 tput alltees, all ok

9a68868

Note: the performance is very similar to that of upstream/master.
Maybe only the simplest 2->2 processes are a bit slower, but that's acceptable.
The number of SIMD instructions has changed, but not in all builds, which is a bit surprising.
All in all, things look ok.

STARTED  AT Thu Jul 20 18:19:00 CEST 2023
./tput/teeThroughputX.sh -mix -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean
ENDED(1) AT Thu Jul 20 21:22:53 CEST 2023 [Status=0]
./tput/teeThroughputX.sh -flt -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean
ENDED(2) AT Thu Jul 20 21:48:39 CEST 2023 [Status=0]
./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -flt -bridge -makeclean
ENDED(3) AT Thu Jul 20 21:58:10 CEST 2023 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -rmbhst
ENDED(4) AT Thu Jul 20 22:01:12 CEST 2023 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -curhst
ENDED(5) AT Thu Jul 20 22:04:11 CEST 2023 [Status=0]


          [fpe] rerun 15 tmad - ggttgg tests fail again madgraph5#655 as expected

Note: performance remains very similar to upstream/master

STARTED AT Thu Jul 20 22:07:15 CEST 2023
ENDED   AT Fri Jul 21 02:16:03 CEST 2023

Status=0

24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_m_inl0_hrd0.txt
1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_d_inl0_hrd0.txt
1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt
1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_m_inl0_hrd0.txt
0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt

valassi mentioned this pull request

feeenableexcept is missing on MacOS #730

Closed

valassi added 3 commits

July 21, 2023 15:05


          [fpe] Revert to upstream/master performance logs in tput and tmad for…

3658f3f

… easier merging

This ~completes the fpe and namespace patches, addressing madgraph5#701 and madgraph5#725, respectively.
(HOWEVER, the CI on MacOS failed for this with madgraph5#730 - still a few things to change before merging).

Unfortunately, I tested that this patch only fixes the IEEE_DIVIDE_BY_ZERO part of madgraph5#701,
but there are still other issues remaining (being debugged in branch nobm).

Revert "[fpe] rerun 15 tmad - ggttgg tests fail again madgraph5#655 as expected"
This reverts commit 9212960.

Revert "[fpe] rerun 78 tput alltees, all ok"
This reverts commit 9a68868.


          [fpe] in ggtt.sa, remove feenableexcept on MacOS where it is not defi…

bf5727b

…ned (madgraph5#730)


          [fpe] backport workaround for madgraph5#730 on MacOS to CODEGEN from …

e93ba8a

…ggtt.sa

valassi mentioned this pull request

CUDA builds of Parameters.cc get the wrong build flags (eg they fail the HRDCOD=1 build) #731

Closed

valassi added 5 commits

July 21, 2023 15:40


          [namespace/fpe] in ggtt.sa makefiles, add 'export CUFLAGS' in SubProc…

324581d

…esses towards src - this fixes HRDCOD=1 builds on non-SM processes madgraph5#731


          [namespace/fpe] backport fix for madgraph5#731 (HRDCOD=1 builds in cu…

a1d5983

…da of non-SM) to CODEGEN from heft_gg_h.sa


          [fpe] regenerate gg_tt and heft_gg_h sa - all ok, differences as expe…

66b8cfe

…cted from madgraph5#730 and madgraph5#731


          [fpe] regenerate the other 5 processes sa with fixes for madgraph5#730 …

838e59a

…and madgraph5#731


          [fpe] ** COMPLETE FPE ** regenerate all 7 processes mad with fixes for …

49f9d3f

…madgraph5#730 and madgraph5#731

This completes the fpe and namespace patches, addressing madgraph5#701 and madgraph5#725, respectively.

Unfortunately, I tested that this patch only fixes the IEEE_DIVIDE_BY_ZERO part of madgraph5#701,
but there are still other issues remaining (being debugged in branch nobm and in madgraph5#733):
  IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL

valassi mentioned this pull request

Three floating point exceptions in CPP launch of nobm_pp_ttW (FPE in COUP values in VVV1P0_1) #733

Closed

valassi force-pushed the fpe branch from 2a3eae6 to 49f9d3f Compare

July 21, 2023 13:58

This was linked to issues Jul 21, 2023

feeenableexcept is missing on MacOS #730

Closed

CUDA builds of Parameters.cc get the wrong build flags (eg they fail the HRDCOD=1 build) #731

Closed

valassi marked this pull request as ready for review

July 21, 2023 14:12

Member Author

valassi commented Jul 21, 2023

This is finally complete - as good as it gets - and passes the CI tests. I am self merging.

I will then document it a posteriori.

valassi changed the title ~~WIP: fixes in xxxxx for FPEs; separate cpu/gpu namespaces and fix runtest segfault~~ WIP: fixes in xxxxx for IEEE_DIVIDE_BY_ZERO FPE; separate cpu/gpu namespaces and fix runtest segfault

valassi changed the title ~~WIP: fixes in xxxxx for IEEE_DIVIDE_BY_ZERO FPE; separate cpu/gpu namespaces and fix runtest segfault~~ Fixes in xxxxx for IEEE_DIVIDE_BY_ZERO FPE; separate cpu/gpu namespaces and fix runtest segfault

valassi merged commit 39de7b3 into madgraph5:master

valassi added a commit to valassi/madgraph4gpu that referenced this pull request


          Merge branch 'fpe' into nobm

458d4c2

(this is the merge of fpe as of commit 49f9d3f, which will be merged to master in madgraph5#723)

valassi mentioned this pull request

fix FPEs and debug nobm_pp_ttW for ATLAS #706

Merged

Member Author

valassi commented Jul 21, 2023

This is some documentation for this MR #723.

This is addressing two rather large/complex issues. It is fixing 8 github issues in total.

Part 1 is floating point exceptions FPEs (branch fpe)

It was meant to address all four FPEs in Four floating point exceptions in CPP launch of pp_ttW (IEEE_DIVIDE_BY_ZERO FPE in vxxxxx function in SIMD mode) #701 for nobm_pp_ttW. However, in the end it only fixed one of the four, the IEEE_DIVIDE_BY_ZERO in SIMD code (essentially, using a fake denominator instead of 0 in SIMD vectors, and using the volatile keyword to outsmart the optimizer). There are three other FPEs pending, to be followed up in Three floating point exceptions in CPP launch of nobm_pp_ttW (FPE in COUP values in VVV1P0_1) #733 and in WIP MR fix FPEs and debug nobm_pp_ttW for ATLAS #706. I have closed Four floating point exceptions in CPP launch of pp_ttW (IEEE_DIVIDE_BY_ZERO FPE in vxxxxx function in SIMD mode) #701 as IEEE_DIVIDE_BY_ZERO is fixed.
The FPEs appear when C++ is linked to Fortran. To reproduce FPEs in C++ code only, I have added feenableExcept in testxxx, and I have maded other enhancements in the tests.
One early attempt to fix IEEE_DIVIDE_BY_ZERO in SIMD code was disabling auto-vectorization for the ixx/oxx/vxx functions. However this resulted in poor performance for these functions. I have fixed this using the volatile keyword instead (after another attempt which was failing on clang, see below). This closes Disabling auto vectorization in ixx/oxx causes a loss of performance #727: now the performance is ok.
Most of the debugging for IEEE_DIVIDE_BY_ZERO in SIMD code was done using gcc. My early attempt to disable auto vectorization was compiler dependent, so I made a note of testing clang too. It turns out that my second attempt, to recover performance, using fake denominators>0, was ok on gcc but failing on clang. This I fixed using volatile. This closes Ensure clang and icc builds are also ok with the new ixx/oxx/vxx #724: all tests pass on gcc, clang and icx, with good performance.
En passant while working on clang and icx, I fixed a minor build warning that was pending since a long time. This closes icpx: warning: overriding '-ffp-contract=fast' option with '-ffp-contract=on' [-Woverriding-t-option] #516.
The feenableExcept that I used to debug FPEs on linux is missing on MacOS, causing build errors. I have therefore removed feenableExcept from MacOS (FPEs are not tested in c++ code on MacOS). This closes feeenableexcept is missing on MacOS #730.

Part 2 is about separating cpu/gpu namespaces and fixing debug builds (branch namespace, initially in MR WIP: separate cpu/gpu namespaces and fix runtest segmentation fault #728 which I then coalesced with this one)

The issue is that, to debug FPE issues, I started using debug builds. (Actually these were not always ideal, as some subtle issues only came from -O3 optimized builds anyway...).
At some point I realised that the debug build of runTest was crashing. I understood that this was because runTest was mixing functions that were defined in different ways for CPU and GPU, or with different global settings. The solution (in line with Clarify build strategy for heterogeneous applications (and clean all build options) #318 about heterogeneous applications) was to improve the separation of the two namespaces for CPU and GPU entities. This fixes the runTest crash and closes Segfault in testxxx runTest.exe for debug builds (need separate cpu/gpu namespaces) #725.
En passant, I identified and fixed a minor issue in the way compiler flags are passed to CUDA for debug builds (-Xcompiler was missing). This closes wrong compiler flags to nvcc for icx in debug mode #729.
Separating CPU and GPU namespaces eventually implied that Parameters.cc must be built twice, once for the CPU and once for the GPU. This means that I had to reintroduce CUDA builds in src (these had disappeared, we were only having SubProcesses builds for CUDA). I implemented this using export NVCC (which is related to makefile cleanup in Cleanup of Makefiles #362 and Clean up interaction of Subprocess and src Makefile (export, override etc) #414... it is probably a good idea to use export much more). In doing this, however, I introduced a bug as I forgot to also export CUFLAGS. This was then fixed, closing CUDA builds of Parameters.cc get the wrong build flags (eg they fail the HRDCOD=1 build) #731.

I think this should be more or less all for this MR.

Next steps on this line of work would be

merge the new master into Jorgen's HIP branch, as it will be affected quite significantly
continue the investigation of FPEs in Three floating point exceptions in CPP launch of nobm_pp_ttW (FPE in COUP values in VVV1P0_1) #733 (the ATLAS process nobm_pp_ttW is unusable)

cc @roiser @oliviermattelaer @hageboeck @zeniheisser @Jooorgen

This was referenced Jul 21, 2023

Clean up interaction of Subprocess and src Makefile (export, override etc) #414

Open

Several fixes for icx2023.2 (including fixes for sqrt FPEs in ixx/oxx/vxx) #737

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment