Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feeenableexcept is missing on MacOS #730

Closed
valassi opened this issue Jul 21, 2023 · 1 comment · Fixed by #723
Closed

feeenableexcept is missing on MacOS #730

valassi opened this issue Jul 21, 2023 · 1 comment · Fixed by #723
Assignees

Comments

@valassi
Copy link
Member

valassi commented Jul 21, 2023

feeenableexcept is missing on MacOS...

... and my MR #723 fails the CI

https://github.com/madgraph5/madgraph4gpu/actions/runs/5621540599/job/15232522196

c++  -O3  -std=c++17 -I. -I../../src -I../../../../../test/googletest/install/include -I../../../../../test/googletest/install/include -Wall -Wshadow -Wextra -ffast-math   -march=x86-64  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -fPIC -c testxxx.cc -o testxxx.o
testxxx.cc:64:5: error: use of undeclared identifier 'feenableexcept'; did you mean 'feraiseexcept'?
    feenableexcept( FE_INVALID | FE_DIVBYZERO | FE_OVERFLOW | FE_UNDERFLOW ); // debug #701
    ^~~~~~~~~~~~~~
    feraiseexcept
/Applications/Xcode_14.2.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/fenv.h:299:12: note: 'feraiseexcept' declared here
extern int feraiseexcept(int /* excepts */);
           ^
testxxx.cc:426:5: error: use of undeclared identifier 'fedisableexcept'; did you mean 'feraiseexcept'?
    fedisableexcept( FE_INVALID | FE_DIVBYZERO | FE_OVERFLOW | FE_UNDERFLOW ); // debug #701
    ^~~~~~~~~~~~~~~
    feraiseexcept
/Applications/Xcode_14.2.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/fenv.h:299:12: note: 'feraiseexcept' declared here
extern int feraiseexcept(int /* excepts */);
           ^
2 errors generated.
make: *** [testxxx.o] Error 1
Error: Process completed with exit code 2.

I will have a quick look if I find a way out. But otherwise, I will just disable this. And if ever fortran tests fail on MacOS, I would just disable MacOS as a platform. We do not really need this anyway.

@valassi
Copy link
Member Author

valassi commented Jul 21, 2023

There are a couple of useful pointers here
https://stackoverflow.com/questions/37819235/how-do-you-enable-floating-point-exceptions-for-clang-in-os-x
https://stackoverflow.com/questions/71821666/trapping-floating-point-exceptions-and-signal-handling-on-apple-silicon

But, they do not seem to be straightforward. They would require interactive MacOS tests, which I do not have and cannot be bothered to have now.

@hageboeck @roiser @oliviermattelaer you want to test this otherwise?

I will remove the feenableexcept on MacOS. This means that

  • runTest will have more chances of succeeding
  • any real issue triggered by Fortran linking in production code has more chances of being undetected and showing up later in production code: if that is the case, and we get a real issue only on Mac, I will disable CI tests on Mac

In any case there are real issues on Linux as #701 is only partially solved...

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023
… easier merging

This ~completes the fpe and namespace patches, addressing madgraph5#701 and madgraph5#725, respectively.
(HOWEVER, the CI on MacOS failed for this with madgraph5#730 - still a few things to change before merging).

Unfortunately, I tested that this patch only fixes the IEEE_DIVIDE_BY_ZERO part of madgraph5#701,
but there are still other issues remaining (being debugged in branch nobm).

Revert "[fpe] rerun 15 tmad - ggttgg tests fail again madgraph5#655 as expected"
This reverts commit 9212960.

Revert "[fpe] rerun 78 tput alltees, all ok"
This reverts commit 9a68868.
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023
…madgraph5#730 and madgraph5#731

This completes the fpe and namespace patches, addressing madgraph5#701 and madgraph5#725, respectively.

Unfortunately, I tested that this patch only fixes the IEEE_DIVIDE_BY_ZERO part of madgraph5#701,
but there are still other issues remaining (being debugged in branch nobm and in madgraph5#733):
  IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023
(this is the merge of fpe as of commit 3658f3f, before fixing madgraph5#730 and madgraph5#731)
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023
…he fixes for madgraph5#701

Now launching fails with a new build error (in cuda)
(this was later filed as madgraph5#730 and fixed in a later commit of branch fpe)
HRDCOD=1 tlau/lauX.sh -CPP nobm_pp_ttW

            ccache /usr/local/cuda-12.0/bin/nvcc   -Xcompiler -fPIC -c -x cu Parameters_sm_no_b_mass.cc -o Parameters_sm_no_b_mass_cu.o
            In file included from Parameters_sm_no_b_mass.cc:15:
            Parameters_sm_no_b_mass.h:26:2: error: #error This non-SM physics process only supports MGONGPU_HARDCODE_PARAM builds (madgraph5#439): please run "make HRDCOD=1"
               26 | #error This non-SM physics process only supports MGONGPU_HARDCODE_PARAM builds (madgraph5#439): please run "make HRDCOD=1"
                  |  ^~~~~

Since I want to use CPP only, I retry disabling also CUDA:

CUDA_HOME=none HRDCOD=1 tlau/lauX.sh -CPP nobm_pp_ttW

And... this fixes the IEEE division by zero, but unfortunately it still finds other IEEE exceptions!

INFO: Running Survey
Creating Jobs
Working on SubProcesses
INFO:     P1_gu_ttxwpd
INFO: Building madevent in madevent_interface.py with 'CPP' matrix elements
INFO:     P1_gd_ttxwmu
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL

In summary: the IEEE_DIVIDE_BY_ZERO part of madgraph5#701 has been fixed, but not the other FPEs...

There are THREE IEEE FPEs still pending in pp_ttW.mad
 IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023
… fpe with the fixes for madgraph5#730 and madgraph5#731

Now the CUDA build of nobm_pp_ttW works - but the SIMD execution still fails with three FPEs madgraph5#733
HRDCOD=1 tlau/lauX.sh -CPP nobm_pp_ttW.mad

INFO: Running Survey
Creating Jobs
Working on SubProcesses
INFO:     P1_gu_ttxwpd
INFO: Building madevent in madevent_interface.py with 'CPP' matrix elements
INFO:     P1_gd_ttxwmu
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
@valassi valassi self-assigned this Jul 21, 2023
valassi added a commit to mg5amcnlo/mg5amcnlo_cudacpp that referenced this issue Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant