Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Val june24 fixed ci #981

Closed
wants to merge 425 commits into from
Closed

Val june24 fixed ci #981

wants to merge 425 commits into from

Conversation

oliviermattelaer
Copy link
Member

This is in order to have the full diff when combining andrea version of june24 with the proper ci.

valassi added 30 commits July 12, 2024 19:49
…6 builds #904 (disabling OMP only for clang16; add -no-pie for fcheck_cpp.exe)
…move link-time -no-pie, add compiler-time -fPIC to fortran
…nd.h (BSD license) to detect when running on valgrind #906

This is needed as part of the fixes for runTest.exe #903, preliminary to #896

Note: the header as-is is copied from /cvmfs/sft.cern.ch/lcg/releases/valgrind/3.23.0-24262/x86_64-el9-gcc11-opt/
(except for the inclusion of "clang-format off" directives)

See https://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.clientreq
…rehensive fixes and debug printouts for bug #903 (recursive iteration, stack overflow, segfault etc)
…target (address sanitizer #207), but keep it commented out
…lerances when running on valgrind #906

Also allow tan(x)=-inf if ctan(x)=+inf and viceversa when running on valgrind #906
…st.cc, testxxx.cc: simplify gtest templates, remove cudaDeviceReset to fix #907, complete preparation of two-test infrastructure #896

More in detail:
- move to the simplest "TEST(" use case of Google tests in MadgraphTest.h and runTest.cc (remove unnecessary levels of templating)
- move gpuDeviceReset() to an atexit function of main in testxxx and comment it out anyway, to fix the segfaults #907
  (eventually it may be necessary to remove all CUDA API calls from destructors, if ever we need to put this back in)
- in runTest.cc, complete a proff of concept for adding two separate tests (without/with multichannel #896)

Fix some clang formatting issues with respect to the last gg_tt.mad
…ng PR #905, constexpr_math.h PR #908 and runTest/cudaDeviceReset PR #909

Add valgrind.h and its symlink in the repo for gg_tt.mad

The new runTest.cc template now has a (commented out) proof of concept for including two tests (with/without multichannel) #896, I will resume from there

After building bldall, the following succeeds
for bck in none sse4 avx2 512y 512z cuda; do echo $bck; ./build.${bck}_d_inl0_hrd0/runTest_*.exe; done

This instead is crashing (again?) for some AVX values
for bck in none sse4 avx2 512y 512z cuda; do echo $bck; valgrind ./build.${bck}_d_inl0_hrd0/runTest_*.exe; done
On closer inspection, this is because valgrind does not support AVX512, so this is ok
…th/without multichannel #896 into the latest regenerated with fixes

Revert "[june24] in gg_tt.mad, temporarely go back to the last code regeneration, removing the attempts to add two tests #896"
This reverts commit 7ef597f.
Fix conflicts: epochX/cudacpp/gg_tt.mad/SubProcesses/runTest.cc

OK! Now the test runs, but nomultichannel succeeds, while multichannel fails as the reference ME is wrong!
This is now back on track, must create a second reference file, then add the actual channelid filling of warps...
…channel and <file.txt2> as ref with multichannel
…unTest (use cuda/double as the reference platform)

CUDACPP_RUNTEST_DUMPEVENTS=1 ./runTest_cuda.exe

Rerunning all tests then succeeds (but the channelid array is constant in all values for the moment...)

for bck in none sse4 avx2 512y 512z cuda; do echo $bck; ./build.${bck}_d_inl0_hrd0/runTest_*.exe; done
…th/without multichannel #896; use <file.txt> as ref without multichannel and <file.txt2> as ref with multichannel
…ate txt ref for runTest (use cuda/double as the reference platform)

CUDACPP_RUNTEST_DUMPEVENTS=1 ./runTest_cuda.exe
\cp ../../test/ref/dump* ../../../CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/test/ref/

Rerunning all tests then succeeds (but the channelid array is constant in all values for the moment...)
…onstexpr

FIXME? #910 this is a third different expression for the number of diagrams, should sanity checks for internal consistency...
….cc, move the dumpSignallingFPEs() call to the base clas dtor, add debug printouts commented out
…tations from .h to .cc, move the dumpSignallingFPEs() call to the base clas dtor, add debug printouts commented out
…ebug printouts if the code is compiled with 'make MG5AMC_CHANNELID_DEBUG=1'

FIXME? Note that MEKDevice takes a device channelid array, it would be easier if this was always a host array and MEKD managed the copy?
…add channelid debug printouts if the code is compiled with 'make MG5AMC_CHANNELID_DEBUG=1'

FIXME? Note that MEKDevice takes a device channelid array, it would be easier if this was always a host array and MEKD managed the copy?
…els 1,2,3,1,2,3... for different events (previously it was 1 for all events)

NB1: the cuda test now fails, the reference file must be recreated
NB2: I expect the SIMD tests to fail using the CUDA reference, due to the different bugs in the current channelId implementation
NB3: eventually #898 the implementation should enforce that all events in a warp use the same channelid
…channel test #896 to use channels 1,2,3,1,2,3... for different events (previously it was 1 for all events)

NB1: the cuda test now fails, the reference file must be recreated
NB2: I expect the SIMD tests to fail using the CUDA reference, due to the different bugs in the current channelId implementation
NB3: eventually #898 the implementation should enforce that all events in a warp use the same channelid
…ate txt ref for runTest (use cuda/double as the reference platform)

CUDACPP_RUNTEST_DUMPEVENTS=1 ./runTest_cuda.exe
\cp ../../test/ref/dump* ../../../CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/test/ref/

NB: the CUDA test succeeds with the new reference files, but the C++ multichannel test #896 fails due to bugs #894 and #899
valassi and others added 21 commits August 21, 2024 14:52
…ds (introduced in 55b3e74): I prefer that users get and report an error if there is something wrong here...
…asier merging

git checkout upstream/master $(git ls-tree --name-only HEAD tmad/logs* tput/logs*)
…ier merging

git checkout upstream/master $(git ls-tree --name-only upstream/master */CODEGEN*txt)
… gg_tt.mad, to ease merging and conflict resolution

(From the cudacpp directory)
git checkout upstream/master $(git ls-tree --name-only upstream/master *.mad *.sa | grep -v ^gg_tt.mad)
…, nvcc #966) into june24

Fix conflicts:
	epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1
	epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/counters.cc
	epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/fbridge.cc
	epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f
	epochX/cudacpp/gg_tt.mad/SubProcesses/counters.cc
	epochX/cudacpp/gg_tt.mad/SubProcesses/fbridge.cc

NB: here I essentially fixed gg_tt.mad, not CODEGEN, which will need to be adjusted a posteriori with a backport

In particular:
- Note1: patch.P1 is now taken from june24, but will need to be recomputed
git checkout HEAD CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1
- Note2: I need to manually port some upstream/master changes in auto_dsig1.f to smatrix_multi.f, which did not yet exist
…sig1.f changes in the latest upstream/master merge
…'call counters_' to uppercase 'CALL COUNTERS_'...
… double space before '!' comments in fortran to please the MG formatter...
… upstream/master

Only patch.P1 changes: in practice, the only three changes are the removal of counters_smatrix1_start/stop calls.

Note that auto_dsig1.f can still be kept out of patching

The only files that still need to be patched are
- 3 in patch.common: Source/makefile, Source/genps.inc, SubProcesses/makefile
- 2 in patch.P1: driver.f, matrix1.f

./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch
git diff --no-ext-diff -R gg_tt.mad/Source/makefile gg_tt.mad/Source/genps.inc gg_tt.mad/SubProcesses/makefile > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common
git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1
git checkout gg_tt.mad
STARTED  AT Wed Aug 21 08:07:41 PM CEST 2024
./tput/teeThroughputX.sh -mix -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean
ENDED(1) AT Wed Aug 21 08:45:12 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -flt -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean
ENDED(2) AT Wed Aug 21 08:55:06 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -flt -bridge -makeclean
ENDED(3) AT Wed Aug 21 09:04:04 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -rmbhst
ENDED(4) AT Wed Aug 21 09:06:49 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -curhst
ENDED(5) AT Wed Aug 21 09:09:32 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -common
ENDED(6) AT Wed Aug 21 09:12:19 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -mix -hrd -makej -susyggtt -susyggt1t1 -smeftggtttt -heftggbb -makeclean
ENDED(7) AT Wed Aug 21 09:32:51 PM CEST 2024 [Status=0]

No errors found in logs

eemumu MEK (channelid array) processed 512 events across 2 channels { 1 : 256, 2 : 256 }
eemumu MEK (no multichannel) processed 512 events across 2 channels { no-multichannel : 512 }
ggttggg MEK (channelid array) processed 512 events across 1240 channels { 1 : 32, 2 : 32, 4 : 32, 5 : 32, 7 : 32, 8 : 32, 14 : 32, 15 : 32, 16 : 32, 18 : 32, 19 : 32, 20 : 32, 22 : 32, 23 : 32, 24 : 32, 26 : 32 }
ggttggg MEK (no multichannel) processed 512 events across 1240 channels { no-multichannel : 512 }
ggttgg MEK (channelid array) processed 512 events across 123 channels { 2 : 32, 3 : 32, 4 : 32, 5 : 32, 6 : 32, 7 : 32, 8 : 32, 9 : 32, 10 : 32, 11 : 32, 12 : 32, 13 : 32, 14 : 32, 15 : 32, 16 : 32, 17 : 32 }
ggttgg MEK (no multichannel) processed 512 events across 123 channels { no-multichannel : 512 }
ggttg MEK (channelid array) processed 512 events across 16 channels { 1 : 64, 2 : 32, 3 : 32, 4 : 32, 5 : 32, 6 : 32, 7 : 32, 8 : 32, 9 : 32, 10 : 32, 11 : 32, 12 : 32, 13 : 32, 14 : 32, 15 : 32 }
ggttg MEK (no multichannel) processed 512 events across 16 channels { no-multichannel : 512 }
ggtt MEK (channelid array) processed 512 events across 3 channels { 1 : 192, 2 : 160, 3 : 160 }
ggtt MEK (no multichannel) processed 512 events across 3 channels { no-multichannel : 512 }
gqttq MEK (channelid array) processed 512 events across 5 channels { 1 : 128, 2 : 96, 3 : 96, 4 : 96, 5 : 96 }
gqttq MEK (no multichannel) processed 512 events across 5 channels { no-multichannel : 512 }
heftggbb MEK (channelid array) processed 512 events across 4 channels { 1 : 128, 2 : 128, 3 : 128, 4 : 128 }
heftggbb MEK (no multichannel) processed 512 events across 4 channels { no-multichannel : 512 }
smeftggtttt MEK (channelid array) processed 512 events across 72 channels { 1 : 32, 2 : 32, 3 : 32, 4 : 32, 5 : 32, 6 : 32, 7 : 32, 8 : 32, 9 : 32, 10 : 32, 11 : 32, 12 : 32, 13 : 32, 14 : 32, 15 : 32, 16 : 32 }
smeftggtttt MEK (no multichannel) processed 512 events across 72 channels { no-multichannel : 512 }
susyggt1t1 MEK (channelid array) processed 512 events across 6 channels { 2 : 128, 3 : 96, 4 : 96, 5 : 96, 6 : 96 }
susyggt1t1 MEK (no multichannel) processed 512 events across 6 channels { no-multichannel : 512 }
susyggtt MEK (channelid array) processed 512 events across 3 channels { 1 : 192, 2 : 160, 3 : 160 }
susyggtt MEK (no multichannel) processed 512 events across 3 channels { no-multichannel : 512 }
…e24 branch - everything ok

STARTED  AT Wed Aug 21 11:17:50 PM CEST 2024
(SM tests)
ENDED(1) AT Thu Aug 22 03:22:15 AM CEST 2024 [Status=0]
(BSM tests)
ENDED(1) AT Thu Aug 22 03:33:50 AM CEST 2024 [Status=0]

24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_d_inl0_hrd0.txt
1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_m_inl0_hrd0.txt

eemumu MEK processed 8192 events across 2 channels { 1 : 8192 }
eemumu MEK processed 90112 events across 2 channels { 1 : 90112 }
ggttggg MEK processed 8192 events across 1240 channels { 1 : 8192 }
ggttggg MEK processed 90112 events across 1240 channels { 1 : 90112 }
ggttgg MEK processed 8192 events across 123 channels { 112 : 8192 }
ggttgg MEK processed 90112 events across 123 channels { 112 : 90112 }
ggttg MEK processed 8192 events across 16 channels { 1 : 8192 }
ggttg MEK processed 90112 events across 16 channels { 1 : 90112 }
ggtt MEK processed 8192 events across 3 channels { 1 : 8192 }
ggtt MEK processed 90112 events across 3 channels { 1 : 90112 }
gqttq MEK processed 8192 events across 5 channels { 1 : 8192 }
gqttq MEK processed 90112 events across 5 channels { 1 : 90112 }
heftggbb MEK processed 8192 events across 4 channels { 1 : 8192 }
heftggbb MEK processed 90112 events across 4 channels { 1 : 90112 }
smeftggtttt MEK processed 8192 events across 72 channels { 1 : 8192 }
smeftggtttt MEK processed 90112 events across 72 channels { 1 : 90112 }
susyggt1t1 MEK processed 8192 events across 6 channels { 3 : 8192 }
susyggt1t1 MEK processed 90112 events across 6 channels { 3 : 90112 }
susyggtt MEK processed 8192 events across 3 channels { 1 : 8192 }
susyggtt MEK processed 90112 events across 3 channels { 1 : 90112 }
… and put ee_mumua to the sde=1 cross-section
@oliviermattelaer oliviermattelaer marked this pull request as ready for review August 30, 2024 14:24
@oliviermattelaer oliviermattelaer requested a review from a team as a code owner August 30, 2024 14:24
@oliviermattelaer
Copy link
Member Author

@Andrea, we can decide what to do with this PR/branch
I do not like that much the ordering of those merge here, but the point is mainly to have a branch where everything is clean and that allow to serve as a base to move forward on your change with nb_warp_used.

But this include the CI (his fixed) and "your" version of june24.
This link to the (upstream) gpucpp_june24 branch since that branch has actually both the change needed for master_june24 and the change for the fixing of the ci. which therefore fix issue #886

So now I'm finally in business to work on warp_used part.

@valassi
Copy link
Member

valassi commented Sep 1, 2024

Hi @oliviermattelaer I have fixed instead my june24 branch in #882.

I would propose to close this #981 and focus instead on #882. Tomorrow morning after I have run some tests we should be able to merge that.

Can I close this?
Thanks
Andrea

Copy link
Member

@valassi valassi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to close this and focus on #882

@valassi
Copy link
Member

valassi commented Sep 3, 2024

Hi @oliviermattelaer as discussed: this is a duplicate of #882 (and it is not up to date), so I am closing this.

I am about to merge #882 into master_june24 instead, and will then merge master_june24 into master in #985.

CLOSING.

@valassi valassi closed this Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants