master_june24: nomultichannel should be an array nullptr, not an array full of 0s #892

valassi · 2024-07-08T09:32:49Z

Another issue introduced in #830 and being reviewed in #882.

Before channelid array #830, the same channel was used for all events. The convention was to use channelid=0 to indicate nomultichannel in some fringe cases (like helicity filtering, or runTest.exe, of check.exe).

To prepare for channelid array #830, we had agreed that nomultichannel would be a nullptr to a channelid array. I was asked to prepare a specific function to handle one of these cases, fbridgesequence_nomultichannel (PR #796). This was already used in the code in some of the relevant places.

In channelid array #830 for master_june24, however, this has changed

the fbridgesequence_nomultichannel that I had been asked to create is no longer used
some bits of the code now seem to go back using channelid=0 (event by event) to indicate no multichannel
other bits of the code (e.g. BridgeKernels.cc) still use pchannelid=nullptr to indicate nonmultichannel

This is confusing, internally inconsistent, and differs from what we had agreed.

More fundamentally, when code is generated in multichannel mode, but a function is called in nomultichannel mode (e.g. because helicity filtering does not need single diagram enhancement, or because so far runTest.exe and check.exe use no multichannel), this is a global parameter of the function for all events: there is no use case where we use multichannel for some events, and nomultichannel for other events, in the same batch.

We should implement nomultichannel as nullptr to channelid.

valassi · 2024-07-08T10:17:41Z

Example in gg_tt.mad (regenerated with master_june24 as is).

This is completely internally inconsistent

https://github.com/valassi/madgraph4gpu/blob/8e312bc02d9d072615fcb1052b5db54754498517/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/CPPProcess.cc#L342

      // Amplitude(s) for diagram number 1
      FFV1_0<W_ACCESS, A_ACCESS, CD_ACCESS>( w_fp[3], w_fp[2], w_fp[4], COUPs[1], 1.0, &amp_fp[0] );
#ifdef MGONGPU_SUPPORTS_MULTICHANNEL
      if( channelIds != 0 )
      {
        channelids_sv = CID_ACCESS::kernelAccessConst( channelIds );
#if defined __CUDACC__ or !defined MGONGPU_CPPSIMD
        if( channelids_sv == 1 ) numerators_sv += cxabs2( amp_sv[0] );
        denominators_sv += cxabs2( amp_sv[0] );
#else
        for( int i = 0; i < neppV; ++i )
        {
          if( channelids_sv[i] == 1 ) numerators_sv += cxabs2( amp_sv[0] );
          denominators_sv += cxabs2( amp_sv[0] );
        }
#endif
      }
#endif

https://github.com/valassi/madgraph4gpu/blob/8e312bc02d9d072615fcb1052b5db54754498517/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/CPPProcess.cc#L1033

#ifdef MGONGPU_SUPPORTS_MULTICHANNEL
    // Event-by-event random choice of color #402
    if( channelIds[0] != 0 ) // no event-by-event choice of color if channelId == 0 (fix FPE #783)
    {
      const unsigned int iconfigC = mgOnGpu::channelId_to_iconfigC[CID_ACCESS::kernelAccessConst( channelIds )]; // coloramps.h uses a channel ordering not the diagram id
      fptype targetamp[ncolor] = { 0 };
      for( int icolC = 0; icolC < ncolor; icolC++ )
      {
        if( icolC == 0 )
          targetamp[icolC] = 0;
        else
          targetamp[icolC] = targetamp[icolC - 1];
        if( mgOnGpu::icolamp[iconfigC][icolC] ) targetamp[icolC] += jamp2_sv[icolC];
      }
      //printf( "sigmaKin: ievt=%4d rndcol=%f\n", ievt, allrndcol[ievt] );
      for( int icolC = 0; icolC < ncolor; icolC++ )
      {
        if( allrndcol[ievt] < ( targetamp[icolC] / targetamp[ncolor - 1] ) )
        {
          allselcol[ievt] = icolC + 1; // NB Fortran [1,ncolor], cudacpp [0,ncolor-1]
          break;
        }
      }
    }
#endif
    // *** END OF PART

The first code section correctly uses nullptr as a check, the second section uses individual channelids (the channelid of the first event...)

roiser · 2024-07-08T11:45:58Z

Hi, there are two places in CPPProcess where the channelID needs to be checked, one in wavefunctions (the above case) where this is all working fine and once in sigmaKin (the below case) where this was not working because our testing infrastructure was not supporting this case. I can try to dig out the case where this has happened. The way out was to check in sigmaKin the first value of the array, please note that I tried to make sure that the array in all cases contains 0s upon initialisation also for GPUs which I could not find a guarantuee for in the documentation.

valassi · 2024-07-08T11:49:38Z

The point is to have a uniform API. No multi channel with channelids is nullptr. I will fix this, faster

valassi · 2024-07-09T15:51:18Z

Hi, there are two places in CPPProcess where the channelID needs to be checked, one in wavefunctions (the above case) where this is all working fine and once in sigmaKin (the below case) where this was not working because our testing infrastructure was not supporting this case.

Thanks, I kind of see what you mean now. Indeed calculate_wavefunction and sigmaKin may handle this differently.

Or even more, I just realised that calculate_wavefunction and sigmaKin MUST hadnle this differently (to make it faster). I am modifying as follows (half way done and pending tests)

In calculate_wavefunction, I am going back to the old API. Essentially this function can be as-is what it is in upstream/master, more or less. The point is that calculate_)wavefunction (as specified in the comments above the signature) computes a single event (cuda, and c++ no simd), or a single event page/vector (c++ simd). Because of our assumptions (see discussion above and master_june24: missing sanity checks that all channelids are the same in a "warp" (or at least inside each SIMD vector) #898), a single scalar channelId is enough to be passed to calculate_wavefunctions.
In sigmakin instead there is the loop over SIMD vectors. This is where the decoding of the channelid for each specific vector must be done. And this is also where I am putting the sanity check that all events in a SIMD vector have the same channelid.

PS With respect to #894, ie the SIMD implementation, under the scanario above this becomes much simpler and extremely simple (again, calculate_wavefunctions goes back to upstream/master essentially)

…omputeMatrixElements in madgraph5#892 (now bool useChannelIds, was uint channelId in master)

…nt in computeMatrixElements in madgraph5#892 - bldall ok but runTest segfaults

…ol useChannelIds, was uint channelId in master) to MEK::computeMatrixElements in madgraph5#892 In CODEGEN, adapt Bridge*/runTest/check accordingly to add the useChannelIds argument in computeMatrixElements

…'channelIds != nullptr' (fix segfault madgraph5#892) The SIMD tests pass without crashing, the CUDA test fails returning ME=0

…nelIds[0] != 0' by 'channelIds != nullptr' (fix segfault madgraph5#892) Also add the missing fix for computeMatrixElements (I just added -Wunused-parameter otherwise this is not spotted) Also replace 0 by nullptr for clarity on the channelIDs array Now both the SIMD and cuda runTest.exe pass without crashing and with success results

valassi · 2024-07-19T13:34:53Z

This is completed in #882.

This issue was so pervasive that it is impossible to point a single commit which fixed this. Amongst others these are relevant

git log --oneline | grep \#892 
229efdf44 [june24] in CODEGEN (backport gg_tt.mad) CPPProcess.cc, replace 'channelIds[0] != 0' by 'channelIds != nullptr' (fix segfault #892)
3bdd88d1b [june24] in gg_tt.mad CPPProcess.cc, replace 'channelIds[0] != 0' by 'channelIds != nullptr' (fix segfault #892)
8ca4fb612 [june24] in CODEGEN (backport gg_tt.mad) add back an argument (now bool useChannelIds, was uint channelId in master) to MEK::computeMatrixElements in #892
416276891 [june24] in gg_tt.mad Bridge*/runTest/check, add useChannelIds argument in computeMatrixElements in #892 - bldall ok but runTest segfaults
a11f1d201 [june24] in gg_tt.mad MatrixElementKernels, add back an argument to computeMatrixElements in #892 (now bool useChannelIds, was uint channelId in master)

i.e. 229efdf 3bdd88d 8ca4fb6 4162768 a11f1d2

In addition, the two issues #914 and #916 are related to this, so also the commits for those two issues are relevant.

Fixed in 882. Linking it there and closing.

valassi self-assigned this Jul 8, 2024

valassi mentioned this issue Jul 8, 2024

master_june24: CID_ACCESS::kernelAccessConst should be called only once (it is now called once per diagram) #895

Closed

valassi mentioned this issue Jul 9, 2024

master_june24: missing sanity checks that all channelids are the same in a "warp" (or at least inside each SIMD vector) #898

Closed

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 17, 2024

[june24] in gg_tt.mad MatrixElementKernels, add back an argument to c…

a11f1d2

…omputeMatrixElements in madgraph5#892 (now bool useChannelIds, was uint channelId in master)

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 17, 2024

[june24] in gg_tt.mad Bridge*/runTest/check, add useChannelIds argume…

4162768

…nt in computeMatrixElements in madgraph5#892 - bldall ok but runTest segfaults

valassi mentioned this issue Jul 17, 2024

master_june24: fbridgesequence_nomultichannel is not used #914

Closed

oliviermattelaer added this to the warp milestone Jul 18, 2024

valassi mentioned this issue Jul 19, 2024

master_june24: fbridge.inc interface was not modified #916

Closed

valassi closed this as completed Jul 19, 2024

valassi linked a pull request Jul 19, 2024 that will close this issue

Merge of master into master_june24 and channelid fixes/reimplementation #882

Merged

valassi mentioned this issue Jul 19, 2024

Merge of master into master_june24 and channelid fixes/reimplementation #882

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

master_june24: nomultichannel should be an array nullptr, not an array full of 0s #892

master_june24: nomultichannel should be an array nullptr, not an array full of 0s #892

valassi commented Jul 8, 2024 •

edited

Loading

valassi commented Jul 8, 2024

roiser commented Jul 8, 2024

valassi commented Jul 8, 2024

valassi commented Jul 9, 2024 •

edited

Loading

valassi commented Jul 19, 2024

master_june24: nomultichannel should be an array nullptr, not an array full of 0s #892

master_june24: nomultichannel should be an array nullptr, not an array full of 0s #892

Comments

valassi commented Jul 8, 2024 • edited Loading

valassi commented Jul 8, 2024

roiser commented Jul 8, 2024

valassi commented Jul 8, 2024

valassi commented Jul 9, 2024 • edited Loading

valassi commented Jul 19, 2024

valassi commented Jul 8, 2024 •

edited

Loading

valassi commented Jul 9, 2024 •

edited

Loading