-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
master_june24: major bug in memory access for channelIds in C++ calculate_wavefunctions (missing ieventAccessRecordConst) #899
Comments
Hi, for the C++ implementation please regard, as also discussed this morning in the dev meeting, that the convention is that a channelID value will be the same for all entries in the mg5 warp. thanks Stefan |
Thanks. This clarifies that the reimplementation of this feature can be considerably simplified (see #898) This (major) bug however has nothing to do with the fact that channelId is meant eventually to be the same inside warps, i.e. what you said above is irrelevant here. The point is that there was a missing ieventAccessRecordConst call (which was not even implemented in MemoryAccessChannelIds.h). Using debug printouts, for a channelId array that contains123123123123 etc, this is what is actually found
I have now fixed this. This is what it should print out.
|
I renamed this to indicate that this is a bug in calculate_wavefunctions. There is a similar bug in sigmaKin #911 |
Sorry @valassi, I'm lost here, which line do you think is problematic? Independently, I think that your test is not the correct one here. Did you observe such content? or is it a pure assumption? Thanks, Olivier |
Hi @oliviermattelaer well since the previous implementation was not even assuming/enforcing that 123123123 was not allowed (I am just implementing this now #898) then I used this as a test. This was mainly to show #894, i.e that the SIMD implementation was completely wrong (I just fixed that). My comment here on memory access #899 is totally orthogonal to whether you assume that a warp contains the same channelid or not, There was a major bug (now fixed) because in practice the implementation was always checking channelids on the first SIMD vector. You see in the wrong implementation, all SIMD vectors are always 123123 etc, while they should be at least (eg SIMD-4) 1231, 2312, 3123. Now fixed. Now if you do put 111122223333, this means that you would get 1111, 1111, 1111, instead of 1111, 2222, 3333. Bug. Again, this is irrespective of whether you enforce or not that channelsids in a warp are the same. The bug is:
There are so many major bugs in #830 that indeed it's easy to get lost. Anyway, I am fixing them one by one and I file the info on each bug for info and for the record. But you may chhose to ignore individual bugs and just wait for the full thing. I should be done in a couple of days. |
PS What I did (#896) is that I added cudacpp tests, without madevent, for the functionality of cudacpp. This is a software functionality that is well defined and can (and MUST) be tested independently of madevent, so I just played around with 123123. Now I am removing this and moving to the 111122223333 (and adding sanity checks that this is the case). The implementation in #830 was not using sanity checks, and in any case the explicit SIMD loops on neppV seem to indicate that it was ready to handle a more general case. Anyway, ignore. I am just testing, testing, testing, what was missing before. |
Ah ok, I guess what you refer to is what I just comment on #894, that it is not clear to me what happens if nb_warp=2 for the SIMD case, if yes then I think that I do understand the issue now, (but no clue how to fix it). Would you be able to point a commit where you fix this, I think it is super important for me (and @roiser) to fully understand what you did such that we can maintain such tricky stuff in the long term. (If you do not have it yet, it might even be good if I or Stefan tries to fix those for the exact same reason) Thanks, Olivier |
…and recreate txt ref for runTest (use cuda/double as the reference platform) CUDACPP_RUNTEST_DUMPEVENTS=1 ./runTest_cuda.exe \cp ../../test/ref/dump* ../../../CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/test/ref/ NB: the CUDA test succeeds with the new reference files, but the C++ multichannel test madgraph5#896 fails due to bugs madgraph5#894 and madgraph5#899
… bug madgraph5#899 make -j bldall -f cudacpp.mk for bck in none sse4 avx2 512y 512z cuda; do echo BACKEND=$bck; ./build.${bck}_d_inl0_hrd0/runTest_*.exe; done | egrep '(BACKEND|channelids_sv)' BACKEND=none channelids_sv 0 1 channelids_sv 1 1 channelids_sv 2 1 channelids_sv 3 1 channelids_sv 4 1 channelids_sv 5 1 channelids_sv 6 1 channelids_sv 7 1 channelids_sv 8 1 channelids_sv 9 1 channelids_sv 10 1 channelids_sv 11 1 channelids_sv 12 1 channelids_sv 13 1 channelids_sv 14 1 channelids_sv 15 1 BACKEND=sse4 channelids_sv 0 1 2 channelids_sv 2 1 2 channelids_sv 4 1 2 channelids_sv 6 1 2 channelids_sv 8 1 2 channelids_sv 10 1 2 channelids_sv 12 1 2 channelids_sv 14 1 2 BACKEND=avx2 channelids_sv 0 1 2 3 1 channelids_sv 4 1 2 3 1 channelids_sv 8 1 2 3 1 channelids_sv 12 1 2 3 1 BACKEND=512y channelids_sv 0 1 2 3 1 channelids_sv 4 1 2 3 1 channelids_sv 8 1 2 3 1 channelids_sv 12 1 2 3 1 BACKEND=512z channelids_sv 0 1 2 3 1 2 3 1 2 channelids_sv 8 1 2 3 1 2 3 1 2 BACKEND=cuda
…by cleanly separating allChannelIds and channelIds and adding a missing ieventAccessRecordConst call Also add the missing ieventAccessRecordConst function in MemoryAccessChannelIds.h and comment out unused non-const function kernelAccess The debug printouts show now that the issue is solved for bck in none sse4 avx2 512y 512z cuda; do echo BACKEND=$bck; ./build.${bck}_d_inl0_hrd0/runTest_*.exe; done | egrep '(BACKEND|channelids_sv)' BACKEND=none channelids_sv 0 1 channelids_sv 1 2 channelids_sv 2 3 channelids_sv 3 1 channelids_sv 4 2 channelids_sv 5 3 channelids_sv 6 1 channelids_sv 7 2 channelids_sv 8 3 channelids_sv 9 1 channelids_sv 10 2 channelids_sv 11 3 channelids_sv 12 1 channelids_sv 13 2 channelids_sv 14 3 channelids_sv 15 1 BACKEND=sse4 channelids_sv 0 1 2 channelids_sv 2 3 1 channelids_sv 4 2 3 channelids_sv 6 1 2 channelids_sv 8 3 1 channelids_sv 10 2 3 channelids_sv 12 1 2 channelids_sv 14 3 1 BACKEND=avx2 channelids_sv 0 1 2 3 1 channelids_sv 4 2 3 1 2 channelids_sv 8 3 1 2 3 channelids_sv 12 1 2 3 1 BACKEND=512y channelids_sv 0 1 2 3 1 channelids_sv 4 2 3 1 2 channelids_sv 8 3 1 2 3 channelids_sv 12 1 2 3 1 BACKEND=512z channelids_sv 0 1 2 3 1 2 3 1 2 channelids_sv 8 3 1 2 3 1 2 3 1 BACKEND=cuda NB: after fixing bug madgraph5#899, the SIMD tests still fail because of bug madgraph5#894 for bck in none sse4 avx2 512y 512z cuda; do echo BACKEND=$bck; ./build.${bck}_d_inl0_hrd0/runTest_*.exe; done | egrep '(BACKEND| ME |r.ME |In comparing)' BACKEND=none BACKEND=sse4 MadgraphTest.h:254: In comparing event 0 from iteration 0 ME 1.094026373218036e-01 r.ME 8.613813520483170e-02 BACKEND=avx2 MadgraphTest.h:254: In comparing event 0 from iteration 0 ME 1.914754413491214e-01 r.ME 8.613813520483170e-02 BACKEND=512y MadgraphTest.h:254: In comparing event 0 from iteration 0 ME 1.914754413491214e-01 r.ME 8.613813520483170e-02 BACKEND=512z MadgraphTest.h:254: In comparing event 0 from iteration 0 ME 1.972915668783644e-01 r.ME 8.613813520483170e-02 BACKEND=cuda
…s in MemoryAccessChannelIds.h after fixing madgraph5#899
…ts for bug madgraph5#899 after fixing the bug
…n channelIds and allChannelIds also in sigmaKin (fix madgraph5#899; wip on madgraph5#911)
…the fact madgraph5#898 that channelids are the same inside each warp This also fixes all pending CID_ACCESS::ieventAccessRecordConst calls madgraph5#899 and madgraph5#911 Note: CPPProcess.cc is now again very close to upstream/master (most of master_june24 changes from madgraph5#830 have been undone)
Thanks Olivier :-) There are many independent bugs here.
I hope I gave you enough pointers, let me know if you need more information. This specific 899 is fixed in 882. Linking it there and closing. |
Another issue introduced in #830 and being reviewed in #882.
This is a big bug. Essentially, ievt0 in calculate_wavefunctions is ignored
https://github.com/valassi/madgraph4gpu/blob/8e312bc02d9d072615fcb1052b5db54754498517/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/CPPProcess.cc#L288
I already noted in #895 that the memory access should be called once for all diagrams instead of once per diagram. Essentially it could be moved higher up (and I am actually moving it elsewhere in sigmakin...).
The problem I report here however is that the current implementation does not reflect and diverges from the relevant design of memory access in the C++ case, look at numerators for instance
numerators = allNumerators;
), as it looks up the relevant ievt from the 'which thread is this' functionsnumerators = NUM_ACCESS::ieventAccessRecord( allNumerators, ievt0 );
Instead, for channelIds this is ignored, and the kernel function is applied directly to the full channelId array. This means in practice that only the very first events in the channelId array are accessed, over and over. Maybe I am wrong, but I would guess this is the case. As there are no specific tests in master_june24 for different channelIds in teh array (see #896), I think this went undetected.
The text was updated successfully, but these errors were encountered: