WIP: gg to ttgggg (2->6 process) #601

valassi · 2023-02-26T10:30:33Z

Following the discussion at the last meeting, I started doing a few tests of gg to ttgggg. Here's a first WIP MR with some changes.

Note on codegen

generating the standalone ggttgggg.sa succeeded (took 10-20 minutes? should check again)
generating the madevent ggttgggg.mad failed with out-of-memory errors (generating madevent always takes longer than standalone. maybe it is the color index mapping? the fortran helamps?)

NB: CPPProcess.cc is 32MB size and contains 15495 Feynman diagrams and a 720x720 color matrix

Note on builds of ggttgggg.sa:

building with cuda is proceeding since 20h
building with gcc failed with an internal compiler error, but this may be memory related as I see top going to around 8GB (I might eventually submit a bug report to gcc)
building with clang is proceeding, both without and with inlining: differently from gcc, the clang build seems limited to 1.5GB RES memory in top all the time... but it make yake many hours (days?)

PS1 currently cuda on itscrd90

top - 11:36:46 up 18:48,  5 users,  load average: 1.02, 1.03, 1.04
Tasks: 209 total,   2 running, 207 sleeping,   0 stopped,   0 zombie
%Cpu(s): 24.9 us,  0.0 sy,  0.0 ni, 74.8 id,  0.0 wa,  0.2 hi,  0.1 si,  0.0 st
MiB Mem :  15337.7 total,   4797.8 free,   7358.4 used,   3582.6 buff/cache
MiB Swap:  16000.0 total,  15991.5 free,      8.4 used.   7979.4 avail Mem 
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND 
   7726 avalassi  20   0 6394712   6.1g    768 R  99.3  40.4   1114:37 cicc

PS2 currently clang on lxplus9

top - 11:37:58 up 38 days, 17:59, 11 users,  load average: 2.00, 2.02, 2.00
Tasks:  24 total,   3 running,  21 sleeping,   0 stopped,   0 zombie
%Cpu(s): 20.0 us,  0.1 sy,  0.0 ni, 79.7 id,  0.0 wa,  0.2 hi,  0.0 si,  0.0 st
MiB Mem :  29099.6 total,   9461.0 free,   5657.3 used,  14473.0 buff/cache
MiB Swap:  10240.0 total,  10203.0 free,     37.0 used.  23442.3 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND 
2130639 avalassi  20   0 1667836   1.5g  84304 R  99.3   5.4  46:23.36 clang++ 
2130714 avalassi  20   0 1667972   1.5g  84440 R  99.3   5.4  46:14.50 clang++

valassi · 2023-02-26T19:21:09Z

Later on however (now) clang has also increased in size

top - 20:20:30 up 39 days,  2:42,  9 users,  load average: 2.34, 2.15, 2.11
Tasks:  24 total,   3 running,  21 sleeping,   0 stopped,   0 zombie
%Cpu(s): 20.1 us,  0.1 sy,  0.0 ni, 79.6 id,  0.0 wa,  0.1 hi,  0.0 si,  0.0 st
MiB Mem :  29099.6 total,   1742.8 free,  14135.6 used,  13712.6 buff/cache
MiB Swap:  10240.0 total,  10199.5 free,     40.5 used.  14963.9 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND 
2130639 avalassi  20   0 5086720   4.7g  36636 R  99.3  16.7 566:02.26 clang++ 
2130714 avalassi  20   0 6615896   6.2g  43004 R  99.3  21.8 565:50.69 clang++

And cuda too

top - 20:21:59 up 4 days,  5:35,  3 users,  load average: 1.07, 1.10, 1.14
Tasks:  12 total,   2 running,  10 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.8 us,  0.2 sy,  0.0 ni, 96.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 164532.5 total, 149810.8 free,   9596.9 used,   6663.6 buff/cache
MiB Swap: 249920.0 total, 249920.0 free,      0.0 used. 154935.6 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND 
 673484 avalassi  20   0 5943184   5.6g  15616 R 100.0   3.5 512:05.93 cicc

valassi · 2023-02-27T07:34:11Z

Update - cuda build is still running after more than one day

top - 08:28:54 up 1 day, 15:40,  5 users,  load average: 1.03, 1.03, 1.00
Tasks: 212 total,   4 running, 208 sleeping,   0 stopped,   0 zombie
%Cpu(s): 26.6 us,  0.7 sy,  0.0 ni, 72.6 id,  0.0 wa,  0.2 hi,  0.0 si,  0.0 st
MiB Mem :  15337.7 total,   3677.4 free,   7897.6 used,   4211.7 buff/cache
MiB Swap:  16000.0 total,  15991.5 free,      8.4 used.   7440.1 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND 
   7726 avalassi  20   0 6853192   6.5g    768 R  99.7  43.4   2361:36 cicc

The clang++ build with inlining has been killed by oom

[avalassi@lxplus9s07 bash] ~> dmesg | grep kill | grep clang
[3381857.623536] clang++ invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[3381857.767210] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/user.slice/user-14546.slice,task_memcg=/user.slice/user-14546.slice/session-31008.scope,task=clang++,pid=2130714,uid=14546

The clang++ build without inlining was still running but seemed stuck: high memory but 0 CPU? I saw that the AFS token had expired in between, so I ctrl-z stopped it, renewed the token and fg resumed it, but this caused a crash immediately afterwards

[avalassi@lxplus9s07 clang14.0.6/cvmfs] /afs/cern.ch/work/a/avalassi/GPU2023/madgraph4gpuClang14/epochX/cudacpp> fg
./tput/throughputX.sh -sa -ggttgggg -makej -avx2only
fatal error: error in backend: IO failure on output stream: Input/output error
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: /cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang++ --gcc-toolchain=/cvmfs/sft.cern.ch/lcg/releases/gcc/12.1.0-57c96/x86_64-centos9 -O3 -std=c++17 -Wall -Wshadow -Wextra -ffast-math -fopenmp -march=haswell -fPIC -I. -I../../src -I../../../../../tools -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -DMGONGPU_HAS_NO_CURAND -c -fcolor-diagnostics -o build.avx2_d_inl0_hrd0/CPPProcess.o CPPProcess.cc
1.      <eof> parser at end of file
 #0 0x0000000001f3d704 (/cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang+++0x1f3d704)
 #1 0x0000000001f3b4a4 llvm::sys::CleanupOnSignal(unsigned long) (/cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang+++0x1f3b4a4)
 #2 0x0000000001e920b4 llvm::CrashRecoveryContext::HandleExit(int) (/cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang+++0x1e920b4)
 #3 0x0000000001f333ce llvm::sys::Process::Exit(int, bool) (/cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang+++0x1f333ce)
 #4 0x0000000000a293d3 (/cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang+++0xa293d3)
 #5 0x0000000001e98799 llvm::report_fatal_error(llvm::Twine const&, bool) (/cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang+++0x1e98799)
 #6 0x0000000001f0e7be (/cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang+++0x1f0e7be)
 #7 0x0000000002272525 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (/cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang+++0x2272525)
 #8 0x0000000002f3763d (/cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang+++0x2f3763d)
 #9 0x0000000003c65f79 clang::ParseAST(clang::Sema&, bool, bool) (/cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang+++0x3c65f79)
#10 0x0000000002929a69 clang::FrontendAction::Execute() (/cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang+++0x2929a69)
#11 0x00000000028b7dcb clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang+++0x28b7dcb)
#12 0x00000000029d8cf3 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang+++0x29d8cf3)
#13 0x0000000000a2a575 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang+++0xa2a575)
#14 0x0000000000a27bfc (/cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang+++0xa27bfc)
#15 0x0000000002741695 (/cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang+++0x2741695)
#16 0x0000000001e91f43 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang+++0x1e91f43)
#17 0x0000000002741a29 (/cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang+++0x2741a29)
#18 0x0000000002714d06 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&) const (/cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang+++0x2714d06)
#19 0x0000000002715729 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*> >&) const (/cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang+++0x2715729)
#20 0x0000000002724699 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*> >&) (/cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang+++0x2724699)
#21 0x0000000000993ea1 main (/cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang+++0x993ea1)
#22 0x00007f930143feb0 __libc_start_call_main (/lib64/libc.so.6+0x3feb0)
#23 0x00007f930143ff60 __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x3ff60)
#24 0x0000000000a26fe5 _start (/cvmfs/sft.cern.ch/lcg/releases/clang/14.0.6-14bdb/x86_64-centos9/bin/clang+++0xa26fe5)
make: *** [makefile:432: build.avx2_d_inl0_hrd0/CPPProcess.o] Error 1

I will restart this one...

valassi · 2023-02-27T19:48:59Z

The clang++ build without inlining finally completed! It took 32 hours to compile CPPProcess.o on lxplus

[avalassi@lxplus9s07 clang14.0.6/cvmfs] /afs/cern.ch/work/a/avalassi/GPU2023/madgraph4gpuClang14/epochX/cudacpp> ./tput/throughputX.sh -sa -ggttgggg -makej -avx2only -nocuda
...
DATE: 2023-02-27_18:15:02
On lxplus9s07.cern.ch [CPU: Intel Core Processor (Broadwell, IBRS)] [GPU: none]:
=========================================================================
runExe /afs/cern.ch/work/a/avalassi/GPU2023/madgraph4gpuClang14/epochX/cudacpp/gg_ttgggg.sa/SubProcesses/P1_Sigma_sm_gg_ttxgggg/build.avx2_d_inl0_hrd0/check.exe -p 1 256 2 OMP=
Process                     = SIGMA_SM_GG_TTXGGGG_CPP [clang 14.0.6 (gcc 12.1.0)] [inlineHel=0] [hardcodePARAM=0]
Workflow summary            = CPP:DBL+CXS:COMMON+RMBHST+MESHST/avx2+NOVBRK
FP precision                = DOUBLE (NaN/abnormal=0, zero=0)
Internal loops fptype_sv    = VECTOR[4] ('avx2': AVX2, 256bit) [cxtype_ref=NO]
OMP threads / `nproc --all` = 1 / 10
EvtsPerSec[Rmb+ME]     (23) = ( 2.824191e+00                 )  sec^-1
EvtsPerSec[MatrixElems] (3) = ( 2.824196e+00                 )  sec^-1
EvtsPerSec[MECalcOnly] (3a) = ( 2.824196e+00                 )  sec^-1
MeanMatrixElemValue         = ( 6.408665e-09 +- 2.650516e-09 )  GeV^-8
TOTAL       :   186.927826 sec
real    3m6.988s
=Symbols in CPPProcess.o= (~sse4:    0) (avx2:3823790) (512y:    0) (512z:    0)
-------------------------------------------------------------------------
cmpExe /afs/cern.ch/work/a/avalassi/GPU2023/madgraph4gpuClang14/epochX/cudacpp/gg_ttgggg.sa/SubProcesses/P1_Sigma_sm_gg_ttxgggg/build.avx2_d_inl0_hrd0/check.exe --common -p 2 64 2
cmpExe /afs/cern.ch/work/a/avalassi/GPU2023/madgraph4gpuClang14/epochX/cudacpp/gg_ttgggg.sa/SubProcesses/P1_Sigma_sm_gg_ttxgggg/build.avx2_d_inl0_hrd0/fcheck.exe 2 64 2
Avg ME (C++/C++)    = 4.579798e-09
Avg ME (F77/C++)    = 4.5797964756172243E-009
Relative difference = 3.3284934734463e-07
OK (relative difference <= 5E-3)
=========================================================================
TEST COMPLETED

[avalassi@lxplus9s07 clang14.0.6/cvmfs] /afs/cern.ch/work/a/avalassi/GPU2023/madgraph4gpuClang14/epochX/cudacpp> ls -l gg_ttgggg.sa/SubProcesses/P1_Sigma_sm_gg_ttxgggg/build.avx2_d_inl0_hrd0/check*
-rwxr-xr-x. 1 avalassi zg 130832 Feb 27 18:15 gg_ttgggg.sa/SubProcesses/P1_Sigma_sm_gg_ttxgggg/build.avx2_d_inl0_hrd0/check.exe*
-rw-r--r--. 1 avalassi zg 163224 Feb 26 10:36 gg_ttgggg.sa/SubProcesses/P1_Sigma_sm_gg_ttxgggg/build.avx2_d_inl0_hrd0/check_sa.o

I will relaunch the build with inlining.

Note instead that the cuda build is still ongoing...

valassi · 2023-03-04T17:31:28Z

The clang build with inlining never completed successfully (on lxplus, my interactive process was logged out every time within one or two days, which I suspect being a symptom of an out of memory).

As for cuda, the build is still running after one week! I will kill the process, it is unreasonable to keep it going longer

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                        
   7726 avalassi  20   0 8794336   8.2g  22676 R  99.3  55.0  10127:59 cicc                                                                                           
   1297 root      20   0  260488  59356  16460 S   0.3   0.4   8:46.66 collectd

…ocess.cc which is 32MB) Note: the generation of gg_ttgggg.mad failed, killed by out-of-memory oom killer after ~1h30 dmesg -T | egrep -i 'killed process' [Fri Feb 24 21:45:56 2023] Out of memory: Killed process 2812622 (python3) total-vm:30208192kB, anon-rss:14254780kB, file-rss:4kB, shmem-rss:0kB, UID:14546 pgtables:58908kB oom_score_adj:0

… directories

…d (-makej -inl) [root@itscrd90 cudacpp]# grep -i 'killed process' /var/log/messages Feb 24 21:45:56 itscrd90.cern.ch kernel: Out of memory: Killed process 2812622 (python3) total-vm:30208192kB, anon-rss:14254780kB, file-rss:4kB, shmem-rss:0kB, UID:14546 pgtables:58908kB oom_score_adj:0 Feb 25 12:08:32 itscrd90.cern.ch kernel: Out of memory: Killed process 25738 (dbus-broker-lau) total-vm:19644kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB, UID:14546 pgtables:60kB oom_score_adj:200 Feb 25 12:08:32 itscrd90.cern.ch kernel: Out of memory: Killed process 2859216 (cudafe++) total-vm:4439180kB, anon-rss:2533172kB, file-rss:0kB, shmem-rss:0kB, UID:14546 pgtables:8728kB oom_score_adj:0 Feb 25 12:09:59 itscrd90.cern.ch kernel: Out of memory: Killed process 2859218 (cudafe++) total-vm:4830956kB, anon-rss:2404060kB, file-rss:0kB, shmem-rss:0kB, UID:14546 pgtables:9504kB oom_score_adj:0 Feb 25 12:12:26 itscrd90.cern.ch kernel: Out of memory: Killed process 2859211 (cudafe++) total-vm:4830956kB, anon-rss:1651848kB, file-rss:0kB, shmem-rss:0kB, UID:14546 pgtables:9496kB oom_score_adj:0 Feb 25 12:17:51 itscrd90.cern.ch kernel: Out of memory: Killed process 2859172 (cc1plus) total-vm:5225996kB, anon-rss:3906132kB, file-rss:0kB, shmem-rss:0kB, UID:14546 pgtables:9800kB oom_score_adj:0 The first line is the failed generation of ggttgggg.mad yesterday. The next lines are the failed builds. NB: the builds failed already with inl0. I only have gg_ttgggg.sa/SubProcesses/P1_Sigma_sm_gg_ttxgggg/build.*hrd0 and none has a complete CPPProcess.o Will retry one by one as ./tput/throughputX.sh -ggttgggg -sa -512yonly -makeclean

…FLAGS+= -freport-bug" to prepare bug reports for internal compiler errors

valassi · 2023-11-25T08:24:47Z

I have rebased over upstream/master... I will probably close this MR as unmerged, but at least it's updated now. And I will cherry pick a few commits elseweher.

…FVs and for compiling them as separate object files (related to splitting kernels)

…d MemoryAccessMomenta.h

…the P subdirectory (depends on npar) - build succeeds for cpp, link fails for cuda ccache /usr/local/cuda-12.0/bin/nvcc -I. -I../../src -Xcompiler -O3 -gencode arch=compute_70,code=compute_70 -gencode arch=compute_70,code=sm_70 -lineinfo -use_fast_math -I/usr/local/cuda-12.0/include/ -DUSE_NVTX -std=c++17 -ccbin /usr/lib64/ccache/g++ -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -Xcompiler -fPIC -c -x cu CPPProcess.cc -o CPPProcess_cuda.o ptxas fatal : Unresolved extern function '_ZN9mg5amcGpu14helas_VVV1P0_1EPKdS1_S1_dddPd'

…cuda tests succeed The build issues some warnings however nvlink warning : SM Arch ('sm_52') not found in './CPPProcess_cuda.o' nvlink warning : SM Arch ('sm_52') not found in './HelAmps_cuda.o' nvlink warning : SM Arch ('sm_52') not found in './CPPProcess_cuda.o' nvlink warning : SM Arch ('sm_52') not found in './HelAmps_cuda.o'

…ption HELINL=L and '#ifdef MGONGPU_LINKER_HELAMPS'

…me on each log

…nd -inlLonly options

… to ease code generation

…y in the HELINL=L mode

…c++, a factor 3 slower for cuda... ./tput/teeThroughputX.sh -ggtt -makej -makeclean -inlLonly diff -u --color tput/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt tput/logs_ggtt_mad/log_ggtt_mad_d_inlL_hrd0.txt -Process = SIGMA_SM_GG_TTX_CUDA [nvcc 12.0.140 (gcc 11.3.1)] [inlineHel=0] [hardcodePARAM=0] +Process = SIGMA_SM_GG_TTX_CUDA [nvcc 12.0.140 (gcc 11.3.1)] [inlineHel=L] [hardcodePARAM=0] Workflow summary = CUD:DBL+THX:CURDEV+RMBDEV+MESDEV/none+NAVBRK FP precision = DOUBLE (NaN/abnormal=0, zero=0) -EvtsPerSec[Rmb+ME] (23) = ( 4.589473e+07 ) sec^-1 -EvtsPerSec[MatrixElems] (3) = ( 1.164485e+08 ) sec^-1 -EvtsPerSec[MECalcOnly] (3a) = ( 1.280951e+08 ) sec^-1 -MeanMatrixElemValue = ( 2.086689e+00 +- 3.413217e-03 ) GeV^0 -TOTAL : 0.528239 sec -INFO: No Floating Point Exceptions have been reported - 2,222,057,027 cycles # 2.887 GHz - 3,171,868,018 instructions # 1.43 insn per cycle - 0.826440817 seconds time elapsed -runNcu /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/build.cuda_d_inl0_hrd0/check_cuda.exe -p 2048 256 1 -==PROF== Profiling "sigmaKin": launch__registers_per_thread 214 +EvtsPerSec[Rmb+ME] (23) = ( 2.667135e+07 ) sec^-1 +EvtsPerSec[MatrixElems] (3) = ( 4.116115e+07 ) sec^-1 +EvtsPerSec[MECalcOnly] (3a) = ( 4.251573e+07 ) sec^-1 +MeanMatrixElemValue = ( 2.086689e+00 +- 3.413217e-03 ) GeV^0 +TOTAL : 0.550450 sec +INFO: No Floating Point Exceptions have been reported + 2,272,219,097 cycles # 2.889 GHz + 3,361,475,195 instructions # 1.48 insn per cycle + 0.842685843 seconds time elapsed +runNcu /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/build.cuda_d_inlL_hrd0/check_cuda.exe -p 2048 256 1 +==PROF== Profiling "sigmaKin": launch__registers_per_thread 190 ==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100%

…lates in HELINL=L mode

…t.mad of HelAmps.h in HELINL=L mode

…t.mad of CPPProcess.cc in HELINL=L mode

…P* (the source is the same but it must be compiled in each P* separately)

… complete its backport

…L=L is complete)

…tions

git add *.mad/*/HelAmps.cc *.mad/*/*/HelAmps.cc *.sa/*/HelAmps.cc *.sa/*/*/HelAmps.cc

…ild failed? ./tput/teeThroughputX.sh -ggttggg -makej -makeclean -inlL ccache /usr/local/cuda-12.0/bin/nvcc -I. -I../../src -Xcompiler -O3 -gencode arch=compute_70,code=compute_70 -gencode arch=compute_70,code=sm_70 -lineinfo -use_fast_math -I/usr/local/cuda-12.0/include/ -DUSE_NVTX -std=c++17 -ccbin /usr/lib64/ccache/g++ -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -DMGONGPU_INLINE_HELAMPS -Xcompiler -fPIC -c -x cu CPPProcess.cc -o build.cuda_d_inl1_hrd0/CPPProcess_cuda.o nvcc error : 'ptxas' died due to signal 9 (Kill signal) make[2]: *** [cudacpp.mk:754: build.cuda_d_inl1_hrd0/CPPProcess_cuda.o] Error 9 make[2]: Leaving directory '/data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttggg.mad/SubProcesses/P1_gg_ttxggg' make[1]: *** [makefile:142: build.cuda_d_inl1_hrd0/.cudacpplibs] Error 2 make[1]: Leaving directory '/data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttggg.mad/SubProcesses/P1_gg_ttxggg' make: *** [makefile:282: bldcuda] Error 2 make: *** Waiting for unfinished jobs....

… build time is from cache ./tput/teeThroughputX.sh -ggttggg -makej -makeclean

…mode (use that from the previous run, not from cache) ./tput/teeThroughputX.sh -ggttggg -makej -makeclean

…factor x2 faster (c++? cuda?), runtime is 5-10% slower in C++, but 5-10% faster in cuda!? ./tput/teeThroughputX.sh -ggttggg -makej -makeclean -inlLonly diff -u --color tput/logs_ggttggg_mad/log_ggttggg_mad_d_inlL_hrd0.txt tput/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt ... On itscrd90.cern.ch [CPU: Intel(R) Xeon(R) Silver 4216 CPU] [GPU: 1x Tesla V100S-PCIE-32GB]: ========================================================================= -runExe /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttggg.mad/SubProcesses/P1_gg_ttxggg/build.cuda_d_inlL_hrd0/check_cuda.exe -p 1 256 2 OMP= +runExe /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttggg.mad/SubProcesses/P1_gg_ttxggg/build.cuda_d_inl0_hrd0/check_cuda.exe -p 1 256 2 OMP= INFO: The following Floating Point Exceptions will cause SIGFPE program aborts: FE_DIVBYZERO, FE_INVALID, FE_OVERFLOW -Process = SIGMA_SM_GG_TTXGGG_CUDA [nvcc 12.0.140 (gcc 11.3.1)] [inlineHel=L] [hardcodePARAM=0] +Process = SIGMA_SM_GG_TTXGGG_CUDA [nvcc 12.0.140 (gcc 11.3.1)] [inlineHel=0] [hardcodePARAM=0] Workflow summary = CUD:DBL+THX:CURDEV+RMBDEV+MESDEV/none+NAVBRK FP precision = DOUBLE (NaN/abnormal=0, zero=0) -EvtsPerSec[Rmb+ME] (23) = ( 4.338149e+02 ) sec^-1 -EvtsPerSec[MatrixElems] (3) = ( 4.338604e+02 ) sec^-1 -EvtsPerSec[MECalcOnly] (3a) = ( 4.338867e+02 ) sec^-1 -MeanMatrixElemValue = ( 1.187066e-05 +- 9.825549e-06 ) GeV^-6 -TOTAL : 2.242693 sec -INFO: No Floating Point Exceptions have been reported - 7,348,976,543 cycles # 2.902 GHz - 16,466,315,526 instructions # 2.24 insn per cycle - 2.591057214 seconds time elapsed -runNcu /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttggg.mad/SubProcesses/P1_gg_ttxggg/build.cuda_d_inlL_hrd0/check_cuda.exe -p 1 256 1 +EvtsPerSec[Rmb+ME] (23) = ( 4.063038e+02 ) sec^-1 +EvtsPerSec[MatrixElems] (3) = ( 4.063437e+02 ) sec^-1 +EvtsPerSec[MECalcOnly] (3a) = ( 4.063626e+02 ) sec^-1 +MeanMatrixElemValue = ( 1.187066e-05 +- 9.825549e-06 ) GeV^-6 +TOTAL : 2.552546 sec +INFO: No Floating Point Exceptions have been reported + 7,969,059,552 cycles # 2.893 GHz + 17,401,037,642 instructions # 2.18 insn per cycle + 2.954791685 seconds time elapsed +runNcu /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttggg.mad/SubProcesses/P1_gg_ttxggg/build.cuda_d_inl0_hrd0/check_cuda.exe -p 1 256 1 ==PROF== Profiling "sigmaKin": launch__registers_per_thread 255 ==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100% ... ========================================================================= -runExe /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttggg.mad/SubProcesses/P1_gg_ttxggg/build.512y_d_inlL_hrd0/check_cpp.exe -p 1 256 2 OMP= +runExe /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttggg.mad/SubProcesses/P1_gg_ttxggg/build.512y_d_inl0_hrd0/check_cpp.exe -p 1 256 2 OMP= INFO: The following Floating Point Exceptions will cause SIGFPE program aborts: FE_DIVBYZERO, FE_INVALID, FE_OVERFLOW -Process = SIGMA_SM_GG_TTXGGG_CPP [gcc 11.3.1] [inlineHel=L] [hardcodePARAM=0] +Process = SIGMA_SM_GG_TTXGGG_CPP [gcc 11.3.1] [inlineHel=0] [hardcodePARAM=0] Workflow summary = CPP:DBL+CXS:CURHST+RMBHST+MESHST/512y+CXVBRK FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = VECTOR[4] ('512y': AVX512, 256bit) [cxtype_ref=YES] -EvtsPerSec[Rmb+ME] (23) = ( 3.459662e+02 ) sec^-1 -EvtsPerSec[MatrixElems] (3) = ( 3.460086e+02 ) sec^-1 -EvtsPerSec[MECalcOnly] (3a) = ( 3.460086e+02 ) sec^-1 +EvtsPerSec[Rmb+ME] (23) = ( 3.835352e+02 ) sec^-1 +EvtsPerSec[MatrixElems] (3) = ( 3.836003e+02 ) sec^-1 +EvtsPerSec[MECalcOnly] (3a) = ( 3.836003e+02 ) sec^-1 MeanMatrixElemValue = ( 1.187066e-05 +- 9.825549e-06 ) GeV^-6 -TOTAL : 1.528240 sec +TOTAL : 1.378567 sec INFO: No Floating Point Exceptions have been reported - 4,140,408,789 cycles # 2.703 GHz - 9,072,597,595 instructions # 2.19 insn per cycle - 1.532357792 seconds time elapsed -=Symbols in CPPProcess_cpp.o= (~sse4: 0) (avx2:94048) (512y: 91) (512z: 0) + 3,738,350,469 cycles # 2.705 GHz + 8,514,195,736 instructions # 2.28 insn per cycle + 1.382567882 seconds time elapsed +=Symbols in CPPProcess_cpp.o= (~sse4: 0) (avx2:80619) (512y: 89) (512z: 0) -------------------------------------------------------------------------

…10-15% slower in both C++ and cuda diff -u --color tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inlL_hrd0.txt tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt -Executing ' ./build.512y_d_inlL_hrd0/madevent_cpp < /tmp/avalassi/input_ggttggg_x10_cudacpp > /tmp/avalassi/output_ggttggg_x10_cudacpp' +Executing ' ./build.512y_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggttggg_x10_cudacpp > /tmp/avalassi/output_ggttggg_x10_cudacpp' [OPENMPTH] omp_get_max_threads/nproc = 1/4 [NGOODHEL] ngoodhel/ncomb = 128/128 [XSECTION] VECSIZE_USED = 8192 @@ -401,10 +401,10 @@ [XSECTION] ChannelId = 1 [XSECTION] Cross section = 2.332e-07 [2.3322993086656014E-007] fbridge_mode=1 [UNWEIGHT] Wrote 303 events (found 1531 events) - [COUNTERS] PROGRAM TOTAL : 320.6913s - [COUNTERS] Fortran Overhead ( 0 ) : 4.5138s - [COUNTERS] CudaCpp MEs ( 2 ) : 316.1312s for 90112 events => throughput is 2.85E+02 events/s - [COUNTERS] CudaCpp HEL ( 3 ) : 0.0463s + [COUNTERS] PROGRAM TOTAL : 288.3304s + [COUNTERS] Fortran Overhead ( 0 ) : 4.4909s + [COUNTERS] CudaCpp MEs ( 2 ) : 283.7968s for 90112 events => throughput is 3.18E+02 events/s + [COUNTERS] CudaCpp HEL ( 3 ) : 0.0426s -Executing ' ./build.cuda_d_inlL_hrd0/madevent_cuda < /tmp/avalassi/input_ggttggg_x10_cudacpp > /tmp/avalassi/output_ggttggg_x10_cudacpp' +Executing ' ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggttggg_x10_cudacpp > /tmp/avalassi/output_ggttggg_x10_cudacpp' [OPENMPTH] omp_get_max_threads/nproc = 1/4 [NGOODHEL] ngoodhel/ncomb = 128/128 [XSECTION] VECSIZE_USED = 8192 @@ -557,10 +557,10 @@ [XSECTION] ChannelId = 1 [XSECTION] Cross section = 2.332e-07 [2.3322993086656006E-007] fbridge_mode=1 [UNWEIGHT] Wrote 303 events (found 1531 events) - [COUNTERS] PROGRAM TOTAL : 19.6663s - [COUNTERS] Fortran Overhead ( 0 ) : 4.9649s - [COUNTERS] CudaCpp MEs ( 2 ) : 13.4667s for 90112 events => throughput is 6.69E+03 events/s - [COUNTERS] CudaCpp HEL ( 3 ) : 1.2347s + [COUNTERS] PROGRAM TOTAL : 18.0242s + [COUNTERS] Fortran Overhead ( 0 ) : 4.9891s + [COUNTERS] CudaCpp MEs ( 2 ) : 11.9530s for 90112 events => throughput is 7.54E+03 events/s + [COUNTERS] CudaCpp HEL ( 3 ) : 1.0821s

…arnings and runtime test failures in HELINL=0 There are still build failures in HELINL=L

…allCOUP2 instead of allCOUP) to FFV2_4_0 and FFV2_4_3, fixing build failures in HELINL=L

…d CI access, to fix the issues observed in ee_mumu I did not find an easier way to do this, because the model is known in the aloha caller but not at the time of aloha codegen

… CI_ACCESS

…one, COUP1/COUP2 instead of COUP; two, CI/CD instead of CD)

Fix conflicts: epochX/cudacpp/tput/teeThroughputX.sh epochX/cudacpp/tput/throughputX.sh

valassi · 2024-08-29T17:17:19Z

I regenerated gg_ttgggg with the helas codegen of PR #978.

Using the HELINL=L option this still fails compilation on gcc. I guess it must be the color algebra that does not follow?

[avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgggg.sa/SubProcesses/P1_Sigma_sm_gg_ttxgggg> date; make -j BACKEND=cppnone HELINL=L; date
Thu Aug 29 06:07:46 PM CEST 2024
BACKEND='cppnone'
OMPFLAGS=
FPTYPE='d'
HELINL='L'
HRDCOD='0'
HASCURAND=hasCurand
HASHIPRAND=hasNoHiprand
Building in BUILDDIR=. for tag=none_d_inlL_hrd0_hasCurand_hasNoHiprand (USEBUILDDIR != 1)
gfortran -I. -fPIC -c fcheck_sa.f -o fcheck_sa_fortran.o
make -C ../../src  -f cudacpp_src.mk
make[1]: Entering directory '/data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgggg.sa/src'
ccache g++  -I. -I../../src -O3  -std=c++17 -Wall -Wshadow -Wextra -ffast-math   -march=x86-64  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -DMGONGPU_LINKER_HELAMPS -fPIC -DUSE_NVTX -I/usr/local/cuda-12.0/include/ -DMGONGPU_HAS_NO_HIPRAND -c check_sa.cc -o check_sa_cpp.o
mkdir -p ../lib
ccache g++  -I. -I../../src -O3  -std=c++17 -Wall -Wshadow -Wextra -ffast-math   -march=x86-64  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -DMGONGPU_LINKER_HELAMPS -fPIC -c CPPProcess.cc -o CPPProcess_cpp.o
ccache g++  -I. -O3  -std=c++17 -Wall -Wshadow -Wextra -ffast-math   -march=x86-64  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -DMGONGPU_LINKER_HELAMPS -fPIC -c read_slha.cc -o read_slha_cpp.o
ccache g++  -I. -O3  -std=c++17 -Wall -Wshadow -Wextra -ffast-math   -march=x86-64  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -DMGONGPU_LINKER_HELAMPS -fPIC -c Parameters_sm.cc -o Parameters_sm_cpp.o
ccache g++  -I. -I../../src -O3  -std=c++17 -Wall -Wshadow -Wextra -ffast-math   -march=x86-64  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -DMGONGPU_LINKER_HELAMPS -fPIC -c MatrixElementKernels.cc -o MatrixElementKernels_cpp.o
ccache g++  -I. -I../../src -O3  -std=c++17 -Wall -Wshadow -Wextra -ffast-math   -march=x86-64  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -DMGONGPU_LINKER_HELAMPS -fPIC -c BridgeKernels.cc -o BridgeKernels_cpp.o
ccache g++  -I. -I../../src -O3 -std=c++17 -Wall -Wshadow -Wextra -march=x86-64 -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -DMGONGPU_LINKER_HELAMPS -fPIC -fno-fast-math -c CrossSectionKernels.cc -o CrossSectionKernels_cpp.o
ccache g++ -shared -o ../lib/libmg5amc_common_cpp.so ./read_slha_cpp.o ./Parameters_sm_cpp.o 
ccache g++  -I. -I../../src -O3  -std=c++17 -Wall -Wshadow -Wextra -ffast-math   -march=x86-64  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -DMGONGPU_LINKER_HELAMPS -fPIC -c HelAmps.cc -o HelAmps_cpp.o
make[1]: Leaving directory '/data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgggg.sa/src'
ccache g++  -I. -I../../src -O3  -std=c++17 -Wall -Wshadow -Wextra -ffast-math   -march=x86-64  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -DMGONGPU_LINKER_HELAMPS -fPIC -c fbridge.cc -o fbridge_cpp.o
ccache g++  -I. -I../../src -O3  -std=c++17 -Wall -Wshadow -Wextra -ffast-math   -march=x86-64  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -DMGONGPU_LINKER_HELAMPS -fPIC -c CommonRandomNumberKernel.cc -o CommonRandomNumberKernel_cpp.o
ccache g++  -I. -I../../src -O3  -std=c++17 -Wall -Wshadow -Wextra -ffast-math   -march=x86-64  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -DMGONGPU_LINKER_HELAMPS -fPIC -c RamboSamplingKernels.cc -o RamboSamplingKernels_cpp.o
ccache g++  -I. -I../../src -O3  -std=c++17 -Wall -Wshadow -Wextra -ffast-math   -march=x86-64  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -DMGONGPU_LINKER_HELAMPS -fPIC -DMGONGPU_HAS_NO_HIPRAND -I/usr/local/cuda-12.0/include/ -c CurandRandomNumberKernel.cc -o CurandRandomNumberKernel_cpp.o
ccache g++  -I. -I../../src -O3  -std=c++17 -Wall -Wshadow -Wextra -ffast-math   -march=x86-64  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -DMGONGPU_LINKER_HELAMPS -fPIC -DMGONGPU_HAS_NO_HIPRAND -c HiprandRandomNumberKernel.cc -o HiprandRandomNumberKernel_cpp.o
ccache g++  -I. -I../../src -O3  -std=c++17 -Wall -Wshadow -Wextra -ffast-math   -march=x86-64  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -DMGONGPU_LINKER_HELAMPS -fPIC -c fsampler.cc -o fsampler_cpp.o
ccache g++  -I. -I../../src -I../../../../../test/googletest/install_gcc11.3.1/include -I../../../../../test/googletest/install_gcc11.3.1/include -O3  -std=c++17 -Wall -Wshadow -Wextra -ffast-math   -march=x86-64  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -DMGONGPU_LINKER_HELAMPS -fPIC -c testxxx.cc -o testxxx_cpp.o
ccache g++  -I. -I../../src -I../../../../../test/googletest/install_gcc11.3.1/include -I../../../../../test/googletest/install_gcc11.3.1/include -O3  -std=c++17 -Wall -Wshadow -Wextra -ffast-math   -march=x86-64  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -DMGONGPU_LINKER_HELAMPS -fPIC -c testmisc.cc -o testmisc_cpp.o
ccache g++  -I. -I../../src -I../../../../../test/googletest/install_gcc11.3.1/include -I../../../../../test/googletest/install_gcc11.3.1/include -O3  -std=c++17 -Wall -Wshadow -Wextra -ffast-math   -march=x86-64  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -DMGONGPU_LINKER_HELAMPS -fPIC -c runTest.cc -o runTest_cpp.o
g++: internal compiler error: Segmentation fault signal terminated program cc1plus
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugs.almalinux.org/> for instructions.
make: *** [makefile:748: CPPProcess_cpp.o] Error 4
Thu Aug 29 06:25:23 PM CEST 2024

valassi · 2024-08-29T18:59:35Z

Also clang fails with a different error 255

make[1]: Leaving directory '/data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgggg.sa/src'
make: *** [makefile:751: CPPProcess_cpp.o] Error 255
Thu Aug 29 08:55:11 PM CEST 2024
[avalassi@itscrd90 clang17.0.1/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgggg.sa/SubProcesses/P1_Sigma_sm_gg_ttxgggg>

valassi marked this pull request as draft February 26, 2023 10:30

valassi self-assigned this Feb 26, 2023

valassi mentioned this pull request Feb 27, 2023

Explicit make targets for cuda, cppnone, cppsse4 etc (make cuda and cpp builds independent from each other) #602

Closed

This was referenced Apr 7, 2023

processes for the paper #344

Open

Parallel make -j builds fail with nvcc error : 'cudafe++' died due to signal 9 (Kill signal) #639

Open

valassi force-pushed the ggtt4g branch from 9889e55 to a3acced Compare November 25, 2023 08:04

valassi added 7 commits November 25, 2023 09:08

[ggtt4g] add "g g > t t~ g g g g" to codegen and formatting scripts

a76d071

[ggtt4g] improve verbosity of checkFormatting script

ed2c2fb

[ggtt4g] add ggttgggg to tput scripts

1355758

[ggtt4g] in tput/teeThroughputX.sh, use log_*_sa for -sa tests of .sa…

8888b38

… directories

[ggtt4g] in gg_ttgggg.sa/SubProcesses/cudacpp.mk, optionally add "CXX…

81eb9a3

…FLAGS+= -freport-bug" to prepare bug reports for internal compiler errors

valassi force-pushed the ggtt4g branch from a3acced to 81eb9a3 Compare November 25, 2023 08:20

valassi added 11 commits August 27, 2024 17:33

[helas] in gg_tt.mad, proof of concept for removing template/inline F…

475463b

…FVs and for compiling them as separate object files (related to splitting kernels)

[helas] in gg_tt.mad and CODEGEN, add comments in MemoryAccessGs.h an…

6b0ba37

…d MemoryAccessMomenta.h

[helas] in gg_tt.mad, avoid link warnings when using RDC

7aef7e2

[helas] in gg_tt.mad, clean up 'linked HelAmps' implementation: add o…

77d157c

…ption HELINL=L and '#ifdef MGONGPU_LINKER_HELAMPS'

[helas] in tput/teeThroughputX.sh, print out the preliminary build ti…

f105b9c

…me on each log

[helas] in tput throughputX.sh and teeThroughputX.sh, add the -inlL a…

5f73fbb

…nd -inlLonly options

[helas] in tput/allTees.sh, add 18 inlL tests

8fe9ba4

[helas] in gg_tt.mad, fix clang formatting

4ee2863

[helas] in gg_tt.mad, fix inlineHel=L printout in check_sa.cc

0b259a8

valassi added 24 commits August 28, 2024 14:31

[helas] in gg_tt.mad CPPProcess.cc and HelAmps_sm.h, move code around…

7fb5a25

… to ease code generation

[helas] in gg_tt.mad cudacpp.mk, build HelAmps.o and use rdc=true onl…

716326c

…y in the HELINL=L mode

[helas] in CODEGEN, complete the backport from gg_tt.mad of file temp…

ee84d7d

…lates in HELINL=L mode

[helas] in CODEGEN model_handling.py, complete the backport from gg_t…

4c4198f

…t.mad of HelAmps.h in HELINL=L mode

[helas] in CODEGEN model_handling.py, complete the backport from gg_t…

ae7d18b

…t.mad of CPPProcess.cc in HELINL=L mode

[helas] in gg_tt.mad, move HelAmps.cc to SubProcesses and link it in …

a58cc9c

…P* (the source is the same but it must be compiled in each P* separately)

[helas] in CODEGEN and gg_tt.mad, fix HelAmps.cc in HELINL=L mode and…

64875e7

… complete its backport

[helas] regenerate gg_tt.mad, check that all is ok (codegen for HELIN…

9f1cfd2

…L=L is complete)

[helas] regenerate all processes with support for HELINL=L

5ca9d2d

[helas] in tmad madX.sh and teeMadX.sh, add -inlonly and -inlLonly op…

f0a5105

…tions

[helas] add HelAmps.cc to all regenerated processes

348ebfd

git add *.mad/*/HelAmps.cc *.mad/*/*/HelAmps.cc *.sa/*/HelAmps.cc *.sa/*/*/HelAmps.cc

[helas] rerun the ggttggg tput test only in inl0 mode - note that the…

de8d452

… build time is from cache ./tput/teeThroughputX.sh -ggttggg -makej -makeclean

[helas] manually fix the build time in the ggttggg tput test in inl0 …

93f351b

…mode (use that from the previous run, not from cache) ./tput/teeThroughputX.sh -ggttggg -makej -makeclean

[helas] in ee_mumu.mad, replace CD_ACCESS by CI_ACCESS to fix build w…

a472158

…arnings and runtime test failures in HELINL=0 There are still build failures in HELINL=L

[helas] in ee_mumu.mad and CODEGEN, add missing arguments (allCOUP1, …

a2b1810

…allCOUP2 instead of allCOUP) to FFV2_4_0 and FFV2_4_3, fixing build failures in HELINL=L

[helas] in CODEGEN, add pairs of helas and linker functions for CD an…

4a00c9c

…d CI access, to fix the issues observed in ee_mumu I did not find an easier way to do this, because the model is known in the aloha caller but not at the time of aloha codegen

[helas] regenerate ee_mumu.mad, with the dual series of CD_ACCESS and…

a07b914

… CI_ACCESS

[helas] regenerate all processes after fixing the two eemumu issues (…

718a84e

…one, COUP1/COUP2 instead of COUP; two, CI/CD instead of CD)

Merge remote-tracking branch 'upstream/master' into ggtt4g

b498209

Fix conflicts: epochX/cudacpp/tput/teeThroughputX.sh epochX/cudacpp/tput/throughputX.sh

Merge branch 'helas' into ggtt4g

1bef1ed

Fix conflicts: epochX/cudacpp/tput/teeThroughputX.sh epochX/cudacpp/tput/throughputX.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: gg to ttgggg (2->6 process) #601

WIP: gg to ttgggg (2->6 process) #601

valassi commented Feb 26, 2023 •

edited

Loading

valassi commented Feb 26, 2023 •

edited

Loading

valassi commented Feb 27, 2023

valassi commented Feb 27, 2023

valassi commented Mar 4, 2023

valassi commented Nov 25, 2023

valassi commented Aug 29, 2024

valassi commented Aug 29, 2024

WIP: gg to ttgggg (2->6 process) #601

Are you sure you want to change the base?

WIP: gg to ttgggg (2->6 process) #601

Conversation

valassi commented Feb 26, 2023 • edited Loading

valassi commented Feb 26, 2023 • edited Loading

valassi commented Feb 27, 2023

valassi commented Feb 27, 2023

valassi commented Mar 4, 2023

valassi commented Nov 25, 2023

valassi commented Aug 29, 2024

valassi commented Aug 29, 2024

valassi commented Feb 26, 2023 •

edited

Loading

valassi commented Feb 26, 2023 •

edited

Loading