master_june24: major bug in memory access for channelIds in C++ calculate_wavefunctions (missing ieventAccessRecordConst) #899

valassi · 2024-07-09T17:55:25Z

Another issue introduced in #830 and being reviewed in #882.

This is a big bug. Essentially, ievt0 in calculate_wavefunctions is ignored
https://github.com/valassi/madgraph4gpu/blob/8e312bc02d9d072615fcb1052b5db54754498517/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/CPPProcess.cc#L288

#ifdef MGONGPUCPP_GPUIMPL
#ifdef __CUDACC__
#pragma nv_diagnostic pop
#endif
      // CUDA kernels take input/output buffers with momenta/MEs for all events
      const fptype* momenta = allmomenta;
      const fptype* COUPs[nxcoup];
      for( size_t ixcoup = 0; ixcoup < nxcoup; ixcoup++ ) COUPs[ixcoup] = allCOUPs[ixcoup];
      fptype* MEs = allMEs;
#ifdef MGONGPU_SUPPORTS_MULTICHANNEL
      fptype* numerators = allNumerators;
      fptype* denominators = allDenominators;
#endif
#else
      // C++ kernels take input/output buffers with momenta/MEs for one specific event (the first in the current event page)
      const fptype* momenta = M_ACCESS::ieventAccessRecordConst( allmomenta, ievt0 );
      const fptype* COUPs[nxcoup];
      for( size_t idcoup = 0; idcoup < ndcoup; idcoup++ )
        COUPs[idcoup] = CD_ACCESS::ieventAccessRecordConst( allCOUPs[idcoup], ievt0 ); // dependent couplings, vary event-by-event
      //for( size_t iicoup = 0; iicoup < nicoup; iicoup++ ) // BUG #823
      for( size_t iicoup = 0; iicoup < nIPC; iicoup++ )     // FIX #823
        COUPs[ndcoup + iicoup] = allCOUPs[ndcoup + iicoup]; // independent couplings, fixed for all events
      fptype* MEs = E_ACCESS::ieventAccessRecord( allMEs, ievt0 );
#ifdef MGONGPU_SUPPORTS_MULTICHANNEL
      fptype* numerators = NUM_ACCESS::ieventAccessRecord( allNumerators, ievt0 );
      fptype* denominators = DEN_ACCESS::ieventAccessRecord( allDenominators, ievt0 );
#endif
#endif

      // Reset color flows (reset jamp_sv) at the beginning of a new event or event page
      for( int i = 0; i < ncolor; i++ ) { jamp_sv[i] = cxzero_sv(); }

#ifdef MGONGPU_SUPPORTS_MULTICHANNEL
      // Numerators and denominators for the current event (CUDA) or SIMD event page (C++)
      fptype_sv& numerators_sv = NUM_ACCESS::kernelAccess( numerators );
      fptype_sv& denominators_sv = DEN_ACCESS::kernelAccess( denominators );
#endif

      // *** DIAGRAM 1 OF 3 ***

      // Wavefunction(s) for diagram number 1
      vxxxxx<M_ACCESS, W_ACCESS>( momenta, 0., cHel[ihel][0], -1, w_fp[0], 0 );

      vxxxxx<M_ACCESS, W_ACCESS>( momenta, 0., cHel[ihel][1], -1, w_fp[1], 1 );

      oxxxxx<M_ACCESS, W_ACCESS>( momenta, cIPD[0], cHel[ihel][2], +1, w_fp[2], 2 );

      ixxxxx<M_ACCESS, W_ACCESS>( momenta, cIPD[0], cHel[ihel][3], -1, w_fp[3], 3 );

      VVV1P0_1<W_ACCESS, CD_ACCESS>( w_fp[0], w_fp[1], COUPs[0], 1.0, 0., 0., w_fp[4] );

      // Amplitude(s) for diagram number 1
      FFV1_0<W_ACCESS, A_ACCESS, CD_ACCESS>( w_fp[3], w_fp[2], w_fp[4], COUPs[1], 1.0, &amp_fp[0] );
#ifdef MGONGPU_SUPPORTS_MULTICHANNEL
      if( channelIds != 0 )
      {
        channelids_sv = CID_ACCESS::kernelAccessConst( channelIds );
#if defined __CUDACC__ or !defined MGONGPU_CPPSIMD
        if( channelids_sv == 1 ) numerators_sv += cxabs2( amp_sv[0] );
        denominators_sv += cxabs2( amp_sv[0] );
#else
        for( int i = 0; i < neppV; ++i )
        {
          if( channelids_sv[i] == 1 ) numerators_sv += cxabs2( amp_sv[0] );
          denominators_sv += cxabs2( amp_sv[0] );
        }
#endif
      }
#endif
      jamp_sv[0] += cxtype( 0, 1 ) * amp_sv[0];
      jamp_sv[1] -= cxtype( 0, 1 ) * amp_sv[0];

      // *** DIAGRAM 2 OF 3 ***

I already noted in #895 that the memory access should be called once for all diagrams instead of once per diagram. Essentially it could be moved higher up (and I am actually moving it elsewhere in sigmakin...).

The problem I report here however is that the current implementation does not reflect and diverges from the relevant design of memory access in the C++ case, look at numerators for instance

CUDA access essentially proceeds only through kernelAccess functions (because numerators = allNumerators;), as it looks up the relevant ievt from the 'which thread is this' functions
C++ instead MUST take into account ievt0, which is the result of a loop in sigmaKin, to know which fragment of the allNumerators array to look at, numerators = NUM_ACCESS::ieventAccessRecord( allNumerators, ievt0 );

Instead, for channelIds this is ignored, and the kernel function is applied directly to the full channelId array. This means in practice that only the very first events in the channelId array are accessed, over and over. Maybe I am wrong, but I would guess this is the case. As there are no specific tests in master_june24 for different channelIds in teh array (see #896), I think this went undetected.

The text was updated successfully, but these errors were encountered:

roiser · 2024-07-09T20:43:29Z

Hi, for the C++ implementation please regard, as also discussed this morning in the dev meeting, that the convention is that a channelID value will be the same for all entries in the mg5 warp. thanks Stefan

valassi · 2024-07-15T15:11:57Z

Hi, for the C++ implementation please regard, as also discussed this morning in the dev meeting, that the convention is that a channelID value will be the same for all entries in the mg5 warp. thanks Stefan

Thanks. This clarifies that the reimplementation of this feature can be considerably simplified (see #898)

This (major) bug however has nothing to do with the fact that channelId is meant eventually to be the same inside warps, i.e. what you said above is irrelevant here. The point is that there was a missing ieventAccessRecordConst call (which was not even implemented in MemoryAccessChannelIds.h).

Using debug printouts, for a channelId array that contains123123123123 etc, this is what is actually found

    BACKEND=none
    channelids_sv 0 1
    channelids_sv 1 1
    channelids_sv 2 1
    channelids_sv 3 1
    channelids_sv 4 1
    channelids_sv 5 1
    channelids_sv 6 1
    channelids_sv 7 1
    channelids_sv 8 1
    channelids_sv 9 1
    channelids_sv 10 1
    channelids_sv 11 1
    channelids_sv 12 1
    channelids_sv 13 1
    channelids_sv 14 1
    channelids_sv 15 1
    BACKEND=sse4
    channelids_sv 0 1 2
    channelids_sv 2 1 2
    channelids_sv 4 1 2
    channelids_sv 6 1 2
    channelids_sv 8 1 2
    channelids_sv 10 1 2
    channelids_sv 12 1 2
    channelids_sv 14 1 2
    BACKEND=avx2
    channelids_sv 0 1 2 3 1
    channelids_sv 4 1 2 3 1
    channelids_sv 8 1 2 3 1
    channelids_sv 12 1 2 3 1
    BACKEND=512y
    channelids_sv 0 1 2 3 1
    channelids_sv 4 1 2 3 1
    channelids_sv 8 1 2 3 1
    channelids_sv 12 1 2 3 1
    BACKEND=512z
    channelids_sv 0 1 2 3 1 2 3 1 2
    channelids_sv 8 1 2 3 1 2 3 1 2
    BACKEND=cuda

I have now fixed this. This is what it should print out.

    BACKEND=none
    channelids_sv 0 1
    channelids_sv 1 2
    channelids_sv 2 3
    channelids_sv 3 1
    channelids_sv 4 2
    channelids_sv 5 3
    channelids_sv 6 1
    channelids_sv 7 2
    channelids_sv 8 3
    channelids_sv 9 1
    channelids_sv 10 2
    channelids_sv 11 3
    channelids_sv 12 1
    channelids_sv 13 2
    channelids_sv 14 3
    channelids_sv 15 1
    BACKEND=sse4
    channelids_sv 0 1 2
    channelids_sv 2 3 1
    channelids_sv 4 2 3
    channelids_sv 6 1 2
    channelids_sv 8 3 1
    channelids_sv 10 2 3
    channelids_sv 12 1 2
    channelids_sv 14 3 1
    BACKEND=avx2
    channelids_sv 0 1 2 3 1
    channelids_sv 4 2 3 1 2
    channelids_sv 8 3 1 2 3
    channelids_sv 12 1 2 3 1
    BACKEND=512y
    channelids_sv 0 1 2 3 1
    channelids_sv 4 2 3 1 2
    channelids_sv 8 3 1 2 3
    channelids_sv 12 1 2 3 1
    BACKEND=512z
    channelids_sv 0 1 2 3 1 2 3 1 2
    channelids_sv 8 3 1 2 3 1 2 3 1
    BACKEND=cuda

valassi · 2024-07-15T15:17:59Z

I renamed this to indicate that this is a bug in calculate_wavefunctions. There is a similar bug in sigmaKin #911

oliviermattelaer · 2024-07-15T16:07:33Z

Sorry @valassi, I'm lost here, which line do you think is problematic?

Independently, I think that your test is not the correct one here.
The design for channelId is to contain values like "1111222233332222333333331111222" (when the warp size is set to 4)
i.e. channelId will contain warp_size identical value in a row (each of those values being "randomly" choose by fortran --depending of the PDF of the (set of) events.
So your example of "channelId array that contains123123123123" is not really testing something interesting (since this is possible if warp_size is set to 1)

Did you observe such content? or is it a pure assumption?

Thanks,

Olivier

valassi · 2024-07-15T16:19:16Z

Hi @oliviermattelaer well since the previous implementation was not even assuming/enforcing that 123123123 was not allowed (I am just implementing this now #898) then I used this as a test. This was mainly to show #894, i.e that the SIMD implementation was completely wrong (I just fixed that).

My comment here on memory access #899 is totally orthogonal to whether you assume that a warp contains the same channelid or not, There was a major bug (now fixed) because in practice the implementation was always checking channelids on the first SIMD vector. You see in the wrong implementation, all SIMD vectors are always 123123 etc, while they should be at least (eg SIMD-4) 1231, 2312, 3123. Now fixed. Now if you do put 111122223333, this means that you would get 1111, 1111, 1111, instead of 1111, 2222, 3333. Bug. Again, this is irrespective of whether you enforce or not that channelsids in a warp are the same.

The bug is:

There is a distinction in SIMD code between "allarray" and "array". The former includes all nevt events. The latter includes only a SIMD vector. The implementation form channelIDs #830 did not take this into account
To pass from allchannelids to channelids you need a call to ieventAccessRecordConst. This was missing. I have now put it in.

There are so many major bugs in #830 that indeed it's easy to get lost. Anyway, I am fixing them one by one and I file the info on each bug for info and for the record. But you may chhose to ignore individual bugs and just wait for the full thing. I should be done in a couple of days.

valassi · 2024-07-15T16:22:29Z

So your example of "channelId array that contains123123123123" is not really testing something interesting (since this is possible if warp_size is set to 1)

Did you observe such content? or is it a pure assumption?

PS What I did (#896) is that I added cudacpp tests, without madevent, for the functionality of cudacpp. This is a software functionality that is well defined and can (and MUST) be tested independently of madevent, so I just played around with 123123. Now I am removing this and moving to the 111122223333 (and adding sanity checks that this is the case).

The implementation in #830 was not using sanity checks, and in any case the explicit SIMD loops on neppV seem to indicate that it was ready to handle a more general case. Anyway, ignore. I am just testing, testing, testing, what was missing before.

oliviermattelaer · 2024-07-16T13:44:48Z

Ah ok, I guess what you refer to is what I just comment on #894, that it is not clear to me what happens if nb_warp=2 for the SIMD case, if yes then I think that I do understand the issue now, (but no clue how to fix it).

Would you be able to point a commit where you fix this, I think it is super important for me (and @roiser) to fully understand what you did such that we can maintain such tricky stuff in the long term. (If you do not have it yet, it might even be good if I or Stefan tries to fix those for the exact same reason)

Thanks,

Olivier

…and recreate txt ref for runTest (use cuda/double as the reference platform) CUDACPP_RUNTEST_DUMPEVENTS=1 ./runTest_cuda.exe \cp ../../test/ref/dump* ../../../CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/test/ref/ NB: the CUDA test succeeds with the new reference files, but the C++ multichannel test madgraph5#896 fails due to bugs madgraph5#894 and madgraph5#899

… bug madgraph5#899 make -j bldall -f cudacpp.mk for bck in none sse4 avx2 512y 512z cuda; do echo BACKEND=$bck; ./build.${bck}_d_inl0_hrd0/runTest_*.exe; done | egrep '(BACKEND|channelids_sv)' BACKEND=none channelids_sv 0 1 channelids_sv 1 1 channelids_sv 2 1 channelids_sv 3 1 channelids_sv 4 1 channelids_sv 5 1 channelids_sv 6 1 channelids_sv 7 1 channelids_sv 8 1 channelids_sv 9 1 channelids_sv 10 1 channelids_sv 11 1 channelids_sv 12 1 channelids_sv 13 1 channelids_sv 14 1 channelids_sv 15 1 BACKEND=sse4 channelids_sv 0 1 2 channelids_sv 2 1 2 channelids_sv 4 1 2 channelids_sv 6 1 2 channelids_sv 8 1 2 channelids_sv 10 1 2 channelids_sv 12 1 2 channelids_sv 14 1 2 BACKEND=avx2 channelids_sv 0 1 2 3 1 channelids_sv 4 1 2 3 1 channelids_sv 8 1 2 3 1 channelids_sv 12 1 2 3 1 BACKEND=512y channelids_sv 0 1 2 3 1 channelids_sv 4 1 2 3 1 channelids_sv 8 1 2 3 1 channelids_sv 12 1 2 3 1 BACKEND=512z channelids_sv 0 1 2 3 1 2 3 1 2 channelids_sv 8 1 2 3 1 2 3 1 2 BACKEND=cuda

…by cleanly separating allChannelIds and channelIds and adding a missing ieventAccessRecordConst call Also add the missing ieventAccessRecordConst function in MemoryAccessChannelIds.h and comment out unused non-const function kernelAccess The debug printouts show now that the issue is solved for bck in none sse4 avx2 512y 512z cuda; do echo BACKEND=$bck; ./build.${bck}_d_inl0_hrd0/runTest_*.exe; done | egrep '(BACKEND|channelids_sv)' BACKEND=none channelids_sv 0 1 channelids_sv 1 2 channelids_sv 2 3 channelids_sv 3 1 channelids_sv 4 2 channelids_sv 5 3 channelids_sv 6 1 channelids_sv 7 2 channelids_sv 8 3 channelids_sv 9 1 channelids_sv 10 2 channelids_sv 11 3 channelids_sv 12 1 channelids_sv 13 2 channelids_sv 14 3 channelids_sv 15 1 BACKEND=sse4 channelids_sv 0 1 2 channelids_sv 2 3 1 channelids_sv 4 2 3 channelids_sv 6 1 2 channelids_sv 8 3 1 channelids_sv 10 2 3 channelids_sv 12 1 2 channelids_sv 14 3 1 BACKEND=avx2 channelids_sv 0 1 2 3 1 channelids_sv 4 2 3 1 2 channelids_sv 8 3 1 2 3 channelids_sv 12 1 2 3 1 BACKEND=512y channelids_sv 0 1 2 3 1 channelids_sv 4 2 3 1 2 channelids_sv 8 3 1 2 3 channelids_sv 12 1 2 3 1 BACKEND=512z channelids_sv 0 1 2 3 1 2 3 1 2 channelids_sv 8 3 1 2 3 1 2 3 1 BACKEND=cuda NB: after fixing bug madgraph5#899, the SIMD tests still fail because of bug madgraph5#894 for bck in none sse4 avx2 512y 512z cuda; do echo BACKEND=$bck; ./build.${bck}_d_inl0_hrd0/runTest_*.exe; done | egrep '(BACKEND| ME |r.ME |In comparing)' BACKEND=none BACKEND=sse4 MadgraphTest.h:254: In comparing event 0 from iteration 0 ME 1.094026373218036e-01 r.ME 8.613813520483170e-02 BACKEND=avx2 MadgraphTest.h:254: In comparing event 0 from iteration 0 ME 1.914754413491214e-01 r.ME 8.613813520483170e-02 BACKEND=512y MadgraphTest.h:254: In comparing event 0 from iteration 0 ME 1.914754413491214e-01 r.ME 8.613813520483170e-02 BACKEND=512z MadgraphTest.h:254: In comparing event 0 from iteration 0 ME 1.972915668783644e-01 r.ME 8.613813520483170e-02 BACKEND=cuda

…s in MemoryAccessChannelIds.h after fixing madgraph5#899

…ts for bug madgraph5#899 after fixing the bug

…n channelIds and allChannelIds also in sigmaKin (fix madgraph5#899; wip on madgraph5#911)

…the fact madgraph5#898 that channelids are the same inside each warp This also fixes all pending CID_ACCESS::ieventAccessRecordConst calls madgraph5#899 and madgraph5#911 Note: CPPProcess.cc is now again very close to upstream/master (most of master_june24 changes from madgraph5#830 have been undone)

valassi · 2024-07-19T13:21:34Z

Ah ok, I guess what you refer to is what I just comment on #894, that it is not clear to me what happens if nb_warp=2 for the SIMD case, if yes then I think that I do understand the issue now, (but no clue how to fix it).

Would you be able to point a commit where you fix this, I think it is super important for me (and @roiser) to fully understand what you did such that we can maintain such tricky stuff in the long term. (If you do not have it yet, it might even be good if I or Stefan tries to fix those for the exact same reason)

Thanks,

Olivier

Thanks Olivier :-)

There are many independent bugs here.

This master_june24: major bug in memory access for channelIds in C++ calculate_wavefunctions (missing ieventAccessRecordConst) #899 is about ieventAccessRecordConst for channelIds missing in calculate_wavefunction. This was fixed essentially here 1cdd2f7. NOTE however that later on I removed all this stuff from calculate_wavefunction, going back to a calculate_wavefunction which only handles one or two SIMD vectors for a fixed channelId, as it was in upstream/master, which was out-of-the-box already fulfilling what was needed here. The channelId array decoding was moved only to sigmakin, where an ieventAccessRecordConst call was in any case already missing and had to be fixed (see 911 below).
Another master_june24: wrong memory access for channelIds in C++ sigmakin (missing ieventAccessRecordConst) #911 is about ieventAccessRecordConst for channelIds missing in sigmaKin. I essentially fixed it here ea2146b, as also discussed in master_june24: wrong memory access for channelIds in C++ sigmakin (missing ieventAccessRecordConst) #911 (comment). This was the minimal fix for FPTYPE=d,f.
Another master_june24: add LHE color/helicity comparison test for 2 warps with different channel, fix mixed mode (two SIMD vectors must have same channel) #924 was about a SECOND ieventAccessRecordConst for channelIds missing in sigmaKin, specifically for the second SIMD vector in mixed mode. I essentially fixed this here valassi@9e77b2b

I hope I gave you enough pointers, let me know if you need more information.

This specific 899 is fixed in 882. Linking it there and closing.

valassi self-assigned this Jul 9, 2024

valassi changed the title ~~master_june24: wrong memory access for channelIds in C++ implementation~~ master_june24: major bug in memory access for channelIds in C++ calculate_wavefunctions (missing ieventAccessRecordConst) Jul 15, 2024

valassi mentioned this issue Jul 15, 2024

master_june24: wrong memory access for channelIds in C++ sigmakin (missing ieventAccessRecordConst) #911

Closed

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 17, 2024

[june24] in gg_tt.mad and CODEGEN, remove unused/unnecessary function…

fbd0cb2

…s in MemoryAccessChannelIds.h after fixing madgraph5#899

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 17, 2024

[june24] in gg_tt.mad and CODEGEN CPPProcess.cc, remove debug printou…

5b35b1d

…ts for bug madgraph5#899 after fixing the bug

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 17, 2024

[june24] in gg_tt.mad and CODEGEN CPPProcess.h/cc, distinguish betwee…

f296e16

…n channelIds and allChannelIds also in sigmaKin (fix madgraph5#899; wip on madgraph5#911)

oliviermattelaer added this to the warp milestone Jul 18, 2024

valassi closed this as completed Jul 19, 2024

valassi linked a pull request Jul 19, 2024 that will close this issue

Merge of master into master_june24 and channelid fixes/reimplementation #882

Merged

valassi mentioned this issue Jul 19, 2024

Merge of master into master_june24 and channelid fixes/reimplementation #882

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

master_june24: major bug in memory access for channelIds in C++ calculate_wavefunctions (missing ieventAccessRecordConst) #899

master_june24: major bug in memory access for channelIds in C++ calculate_wavefunctions (missing ieventAccessRecordConst) #899

valassi commented Jul 9, 2024

roiser commented Jul 9, 2024

valassi commented Jul 15, 2024

valassi commented Jul 15, 2024

oliviermattelaer commented Jul 15, 2024

valassi commented Jul 15, 2024

valassi commented Jul 15, 2024

oliviermattelaer commented Jul 16, 2024

valassi commented Jul 19, 2024

master_june24: major bug in memory access for channelIds in C++ calculate_wavefunctions (missing ieventAccessRecordConst) #899

master_june24: major bug in memory access for channelIds in C++ calculate_wavefunctions (missing ieventAccessRecordConst) #899

Comments

valassi commented Jul 9, 2024

roiser commented Jul 9, 2024

valassi commented Jul 15, 2024

valassi commented Jul 15, 2024

oliviermattelaer commented Jul 15, 2024

valassi commented Jul 15, 2024

valassi commented Jul 15, 2024

oliviermattelaer commented Jul 16, 2024

valassi commented Jul 19, 2024