Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA builds of Parameters.cc get the wrong build flags (eg they fail the HRDCOD=1 build) #731

Closed
valassi opened this issue Jul 21, 2023 · 3 comments · Fixed by #723
Closed
Assignees

Comments

@valassi
Copy link
Member

valassi commented Jul 21, 2023

CUDA builds of Parameters.cc get the wrong build flags (eg they fail the HRDCOD=1 build)

My MR #723 has another issue that went undetected by most tests (and which I only found out in building HRDCOD=1 pp_tt_W to test #701 manually): non-SM tests also fail HRDCOD=1 tests. This is probably a minor issue. But it went undetected, so the tests should be made stronger/wider.

We should add heft_gg_h (or another non-SM process with HRDCOD=1) to tput tests and to CI tests, see now #732.

@valassi
Copy link
Member Author

valassi commented Jul 21, 2023

See 840a81a for one of the last commits of #723

Now launching fails with a new build error (in cuda)
HRDCOD=1 tlau/lauX.sh -CPP nobm_pp_ttW

            ccache /usr/local/cuda-12.0/bin/nvcc   -Xcompiler -fPIC -c -x cu Parameters_sm_no_b_mass.cc -o Parameters_sm_no_b_mass_cu.o
            In file included from Parameters_sm_no_b_mass.cc:15:
            Parameters_sm_no_b_mass.h:26:2: error: #error This non-SM physics process only supports MGONGPU_HARDCODE_PARAM builds (https://github.com/madgraph5/madgraph4gpu/issues/439): please run "make HRDCOD=1"
               26 | #error This non-SM physics process only supports MGONGPU_HARDCODE_PARAM builds (https://github.com/madgraph5/madgraph4gpu/issues/439): please run "make HRDCOD=1"
                  |  ^~~~~

Since I want to use CPP only, I retry disabling also CUDA:

@valassi
Copy link
Member Author

valassi commented Jul 21, 2023

The same error can be simply detected in heft

ccache /usr/local/cuda-12.0/bin/nvcc   -Xcompiler -fPIC -c -x cu Parameters_heft.cc -o Parameters_heft_cu.o
In file included from Parameters_heft.cc:15:
Parameters_heft.h:26:2: error: #error This non-SM physics process only supports MGONGPU_HARDCODE_PARAM builds (#439): please run "make HRDCOD=1"
   26 | #error This non-SM physics process only supports MGONGPU_HARDCODE_PARAM builds (#439): please run "make HRDCOD=1"
      |  ^~~~~

@valassi
Copy link
Member Author

valassi commented Jul 21, 2023

In upstream/master this was

ccache /cvmfs/sft.cern.ch/lcg/releases/gcc/11.2.0-ad950/x86_64-centos8/bin/g++  -O3  -std=c++17 -I.  -fPIC -Wall -Wshadow -Wextra -ffast-math  -fopenmp -march=skylake-avx512 -mprefer-vector-width=256  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -DMGONGPU_HARDCODE_PARAM -c Parameters_heft.cc -o Parameters_heft.o

In MR #723 this was

ccache /usr/local/cuda-12.0/bin/nvcc   -Xcompiler -fPIC -c -x cu Parameters_heft.cc -o Parameters_heft_cu.o
In file included from Parameters_heft.cc:15:
Parameters_heft.h:26:2: error: #error This non-SM physics process only supports MGONGPU_HARDCODE_PARAM builds (#439): please run "make HRDCOD=1"
   26 | #error This non-SM physics process only supports MGONGPU_HARDCODE_PARAM builds (#439): please run "make HRDCOD=1"
      |  ^~~~~

WELL. The difference is that the namespace MR has introduced the build of this file separately for CUDA. This was not built separately for CUDA before.

@valassi valassi changed the title Add heft_gg_h (or another non-SM process with HRDCOD=1) to tput tests and to CI tests CUDA builds of Parameters.cc get the wrong build flags (eg they fail the HRDCOD=1 build) Jul 21, 2023
@valassi valassi self-assigned this Jul 21, 2023
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023
…esses towards src - this fixes HRDCOD=1 builds on non-SM processes madgraph5#731
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023
…da of non-SM) to CODEGEN from heft_gg_h.sa
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023
…madgraph5#730 and madgraph5#731

This completes the fpe and namespace patches, addressing madgraph5#701 and madgraph5#725, respectively.

Unfortunately, I tested that this patch only fixes the IEEE_DIVIDE_BY_ZERO part of madgraph5#701,
but there are still other issues remaining (being debugged in branch nobm and in madgraph5#733):
  IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023
(this is the merge of fpe as of commit 3658f3f, before fixing madgraph5#730 and madgraph5#731)
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023
… fpe with the fixes for madgraph5#730 and madgraph5#731

Now the CUDA build of nobm_pp_ttW works - but the SIMD execution still fails with three FPEs madgraph5#733
HRDCOD=1 tlau/lauX.sh -CPP nobm_pp_ttW.mad

INFO: Running Survey
Creating Jobs
Working on SubProcesses
INFO:     P1_gu_ttxwpd
INFO: Building madevent in madevent_interface.py with 'CPP' matrix elements
INFO:     P1_gd_ttxwmu
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
valassi added a commit to mg5amcnlo/mg5amcnlo_cudacpp that referenced this issue Aug 16, 2023
… builds in cuda of non-SM) to CODEGEN from heft_gg_h.sa
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant