-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: studies on CMS DY #946
Draft
valassi
wants to merge
316
commits into
madgraph5:master
Choose a base branch
from
valassi:cmsdy
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…tch is not complete)
./tlau/lauX.sh -fortran gg_tt.mad -fromgridpack
…one (with backend switch) ./tlau/lauX.sh -fortran gg_tt.mad -fromgridpack
…LL backends (with backend switch) ./tlau/lauX.sh -ALL gg_tt.mad -fromgridpack What remains TODO - instrument a better profiling of the time spent - add events.lhe comparison madgraph5#956 (once fortran/cpp mismatch and second helicity is fixed)
…n itgold91) CUDACPP_RUNTIME_DISABLEFPE=1 ./tlau/lauX.sh -nomakeclean -fortran pp_dy012j.mad -fromgridpack
…ne (with backend switch) ./tlau/lauX.sh -cppnone gg_tt.mad -fromgridpack
…LL backends (with backend switch) ./tlau/lauX.sh -ALL gg_tt.mad -fromgridpack What remains TODO - instrument a better profiling of the time spent - add events.lhe comparison madgraph5#956 (once fortran/cpp mismatch and second helicity is fixed)
…madevent_interface.py and prepare to modify it cp -dpr gg_tt.mad/madevent/bin/internal/madevent_interface.py MG5aMC_patches/ It must then be symlinked in gg_tt.mad/madevent/bin/internal: ln -sf ../../../../MG5aMC_patches/madevent_interface.py .
… with two P*) ./tlau/lauX.sh -fortran gq_ttq.mad -togridpack
…-format v15 from cvmfs if a more recent version is installed madgraph5#952
…ocess with two P* directories)
…one (with backend switch) ./tlau/lauX.sh -cppnone gq_ttq.mad -fromgridpack
…veral debug printouts
…extra debug printouts) ./tlau/lauX.sh -cppnone gq_ttq.mad -fromgridpack
…rnal/gen_ximprove.py cp -dpr gg_tt.mad/madevent/bin/internal/gen_ximprove.py MG5aMC_patches
…_interface.py), add several debug printouts
…additional debug printouts in gen_ximprove.py) ./tlau/lauX.sh -cppnone gq_ttq.mad -fromgridpack
…rnal/gen_ximprove.py cp -dpr gg_tt.mad/madevent/bin/internal/cluster.py MG5aMC_patches
…econds() call and go back to the old getTotalDurationSeconds
…mer overhead if CUDACPP_RUNTIME_REMOVETIMEROVERHEAD is set However, test counters like sample_get_x need a special handling
…UNTERS, remove special meaning of PROGRAM counters
…ng a TEST counter as included in a non-TEST counter, to subtract ovberheads
…ated counter overhead
…SpaceSampling These are the first results where timer overhead is removed: looks nice, but the overhead should be computed in the counters.cc calls rather than in the individual timers (this would also make more sense with respect to timermap.h where this will not be possible - remane the env, too) ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] *** USING RDTSC-BASED TIMERS (do not remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 4.4608s [COUNTERS] Fortran Other ( 0 ) : 0.1171s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0690s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 3.2317s for 1087437 events => throughput is 3.36E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0917s for 32768 events => throughput is 3.57E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1719s for 16384 events => throughput is 9.53E+04 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0483s for 16384 events => throughput is 3.39E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0691s for 16384 events => throughput is 2.37E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1276s for 1087437 events => throughput is 8.52E+06 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4718s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0269s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 2.3519s for 14136681 events => throughput is 6.01E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 4.4251s [COUNTERS] OVERALL MEs ( 32 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s CUDACPP_RUNTIME_USECHRONOTIMERS=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] *** USING STD::CHRONO TIMERS (do not remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 5.2204s [COUNTERS] Fortran Other ( 0 ) : 0.1550s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0697s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 3.9335s for 1087437 events => throughput is 2.76E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0924s for 32768 events => throughput is 3.55E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1722s for 16384 events => throughput is 9.52E+04 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0487s for 16384 events => throughput is 3.36E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0689s for 16384 events => throughput is 2.38E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1401s for 1087437 events => throughput is 7.76E+06 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4779s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0263s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0358s for 16384 events => throughput is 4.58E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 2.8064s for 14136681 events => throughput is 5.04E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 5.1846s [COUNTERS] OVERALL MEs ( 32 ) : 0.0358s for 16384 events => throughput is 4.58E+05 events/s CUDACPP_RUNTIME_REMOVETIMEROVERHEAD=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp INFO: RdtscTimer overhead : 0.0179s for 1M start/stop cycles [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 4.4668s [COUNTERS] PROGRAM COUNTEROVERHEAD : 0.2924s ------------------------------------------------------------- [COUNTERS] *** USING RDTSC-BASED TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 4.1745s [COUNTERS] Fortran Other ( 0 ) : 0.1190s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0696s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 2.9612s for 1087437 events => throughput is 3.67E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0913s for 32768 events => throughput is 3.59E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1709s for 16384 events => throughput is 9.59E+04 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0482s for 16384 events => throughput is 3.40E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0678s for 16384 events => throughput is 2.42E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1125s for 1087437 events => throughput is 9.67E+06 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4716s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0266s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0358s for 16384 events => throughput is 4.58E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 2.0989s for 14136681 events => throughput is 6.74E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 4.1387s [COUNTERS] OVERALL MEs ( 32 ) : 0.0358s for 16384 events => throughput is 4.58E+05 events/s CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_REMOVETIMEROVERHEAD=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp INFO: ChronoTimer overhead : 0.0489s for 1M start/stop cycles [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 5.2779s [COUNTERS] PROGRAM COUNTEROVERHEAD : 0.7998s ------------------------------------------------------------- [COUNTERS] *** USING STD::CHRONO TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 4.4781s [COUNTERS] Fortran Other ( 0 ) : 0.1570s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0669s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 3.2485s for 1087437 events => throughput is 3.35E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0930s for 32768 events => throughput is 3.52E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1716s for 16384 events => throughput is 9.55E+04 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0474s for 16384 events => throughput is 3.46E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0681s for 16384 events => throughput is 2.41E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.0929s for 1087437 events => throughput is 1.17E+07 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4705s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0266s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 2.1629s for 14136681 events => throughput is 6.54E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 4.4424s [COUNTERS] OVERALL MEs ( 32 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s CUDACPP_RUNTIME_REMOVETIMEROVERHEAD=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 3.8210s [COUNTERS] PROGRAM COUNTEROVERHEAD : 0.0000s ------------------------------------------------------------- [COUNTERS] *** USING RDTSC-BASED TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 3.8210s CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_REMOVETIMEROVERHEAD=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 3.8301s [COUNTERS] PROGRAM COUNTEROVERHEAD : 0.0000s ------------------------------------------------------------- [COUNTERS] *** USING STD::CHRONO TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 3.8301s
…s: this will be moved to counters alone Revert "[prof] in gux_taptamggux.mad timer.h, add instead a getTotalOverheadSeconds() call and go back to the old getTotalDurationSeconds" This reverts commit ad9b747. Revert "[prof] in gux_taptamggux.mad timer.h, add the option to remove overhead from getTotalDurationSeconds calls" This reverts commit 5c0a2ed.
…unter overhead (remove it from timer.h: there will be none for tiumermap.h) Rename the env as CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD to make it clear that this is in the counters.cc infrastructure These are the results (1) keep overhead ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] *** USING RDTSC-BASED TIMERS (do not remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 4.5315s [COUNTERS] Fortran Other ( 0 ) : 0.1198s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0678s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 3.2691s for 1087437 events => throughput is 3.33E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.1044s for 32768 events => throughput is 3.14E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1757s for 16384 events => throughput is 9.33E+04 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0543s for 16384 events => throughput is 3.02E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0731s for 16384 events => throughput is 2.24E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1322s for 1087437 events => throughput is 8.23E+06 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4719s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0274s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0358s for 16384 events => throughput is 4.57E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 2.3686s for 14136681 events => throughput is 5.97E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 4.4957s [COUNTERS] OVERALL MEs ( 32 ) : 0.0358s for 16384 events => throughput is 4.57E+05 events/s CUDACPP_RUNTIME_USECHRONOTIMERS=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] *** USING STD::CHRONO TIMERS (do not remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 5.2048s [COUNTERS] Fortran Other ( 0 ) : 0.1559s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0673s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 3.9265s for 1087437 events => throughput is 2.77E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0993s for 32768 events => throughput is 3.30E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1648s for 16384 events => throughput is 9.94E+04 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0514s for 16384 events => throughput is 3.19E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0700s for 16384 events => throughput is 2.34E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1365s for 1087437 events => throughput is 7.97E+06 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4711s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0264s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 2.8006s for 14136681 events => throughput is 5.05E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 5.1691s [COUNTERS] OVERALL MEs ( 32 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s (2) remove overhead CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp INFO: COUNTERS overhead : 0.0331s for 1M start/stop cycles [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 4.5208s [COUNTERS] PROGRAM COUNTEROVERHEAD : 0.5413s ------------------------------------------------------------- [COUNTERS] *** USING RDTSC-BASED TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 3.9795s [COUNTERS] Fortran Other ( 0 ) : 0.1548s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0670s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 2.7547s for 1087437 events => throughput is 3.95E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0988s for 32768 events => throughput is 3.32E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1639s for 16384 events => throughput is 1.00E+05 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0510s for 16384 events => throughput is 3.21E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0674s for 16384 events => throughput is 2.43E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.0898s for 1087437 events => throughput is 1.21E+07 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4700s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0266s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0356s for 16384 events => throughput is 4.60E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 1.8855s for 14136681 events => throughput is 7.50E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 3.9439s [COUNTERS] OVERALL MEs ( 32 ) : 0.0356s for 16384 events => throughput is 4.60E+05 events/s CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp INFO: COUNTERS overhead : 0.0640s for 1M start/stop cycles [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 5.3491s [COUNTERS] PROGRAM COUNTEROVERHEAD : 1.0455s ------------------------------------------------------------- [COUNTERS] *** USING STD::CHRONO TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 4.3036s [COUNTERS] Fortran Other ( 0 ) : 0.2216s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0692s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 3.0230s for 1087437 events => throughput is 3.60E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0992s for 32768 events => throughput is 3.30E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1652s for 16384 events => throughput is 9.92E+04 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0504s for 16384 events => throughput is 3.25E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0684s for 16384 events => throughput is 2.39E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.0716s for 1087437 events => throughput is 1.52E+07 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4727s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0266s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 1.9427s for 14136681 events => throughput is 7.28E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 4.2679s [COUNTERS] OVERALL MEs ( 32 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s (3) remove overhead, disable individual timers (so here the overhead is 0) CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp INFO: COUNTERS overhead : 0.0039s for 1M start/stop cycles [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 3.7998s [COUNTERS] PROGRAM COUNTEROVERHEAD : 0.0000s ------------------------------------------------------------- [COUNTERS] *** USING RDTSC-BASED TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 3.7998s CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp INFO: COUNTERS overhead : 0.0038s for 1M start/stop cycles [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 3.9067s [COUNTERS] PROGRAM COUNTEROVERHEAD : 0.0000s ------------------------------------------------------------- [COUNTERS] *** USING STD::CHRONO TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 3.9067s
…ter overhead These are the results (1) keep overhead ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] *** USING RDTSC-BASED TIMERS (do not remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 4.4766s [COUNTERS] Fortran Other ( 0 ) : 0.1202s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0685s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 3.2400s for 1087437 events => throughput is 3.36E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.1007s for 32768 events => throughput is 3.25E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1673s for 16384 events => throughput is 9.79E+04 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0521s for 16384 events => throughput is 3.14E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0687s for 16384 events => throughput is 2.38E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1237s for 1087437 events => throughput is 8.79E+06 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4728s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0269s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 2.3496s for 14136681 events => throughput is 6.02E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 4.4409s [COUNTERS] OVERALL MEs ( 32 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s CUDACPP_RUNTIME_USECHRONOTIMERS=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] *** USING STD::CHRONO TIMERS (do not remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 5.3144s [COUNTERS] Fortran Other ( 0 ) : 0.1588s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0674s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 4.0191s for 1087437 events => throughput is 2.71E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0996s for 32768 events => throughput is 3.29E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1660s for 16384 events => throughput is 9.87E+04 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0508s for 16384 events => throughput is 3.22E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0704s for 16384 events => throughput is 2.33E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1482s for 1087437 events => throughput is 7.34E+06 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4718s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0267s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 2.8646s for 14136681 events => throughput is 4.94E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 5.2787s [COUNTERS] OVERALL MEs ( 32 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s (2) remove overhead CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp INFO: COUNTERS overhead : 0.0338s for 1M start/stop cycles [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 4.8244s [COUNTERS] PROGRAM COUNTEROVERHEAD : 0.8905s ------------------------------------------------------------- [COUNTERS] *** USING RDTSC-BASED TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 3.9339s [COUNTERS] Fortran Other ( 0 ) : 0.2954s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0674s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 2.7332s for 1087437 events => throughput is 3.98E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.1003s for 32768 events => throughput is 3.27E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1688s for 16384 events => throughput is 9.71E+04 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0507s for 16384 events => throughput is 3.23E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0695s for 16384 events => throughput is 2.36E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.0924s for 1087437 events => throughput is 1.18E+07 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4692s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0263s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 1.8723s for 14136681 events => throughput is 7.55E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 3.8982s [COUNTERS] OVERALL MEs ( 32 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp INFO: COUNTERS overhead : 0.0637s for 1M start/stop cycles [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 5.8826s [COUNTERS] PROGRAM COUNTEROVERHEAD : 1.6786s ------------------------------------------------------------- [COUNTERS] *** USING STD::CHRONO TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 4.2040s [COUNTERS] Fortran Other ( 0 ) : 0.4831s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0691s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 2.9924s for 1087437 events => throughput is 3.63E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0983s for 32768 events => throughput is 3.33E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1669s for 16384 events => throughput is 9.81E+04 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0506s for 16384 events => throughput is 3.24E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0676s for 16384 events => throughput is 2.42E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.0698s for 1087437 events => throughput is 1.56E+07 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4712s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0267s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0350s for 16384 events => throughput is 4.68E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 1.9227s for 14136681 events => throughput is 7.35E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 4.1690s [COUNTERS] OVERALL MEs ( 32 ) : 0.0350s for 16384 events => throughput is 4.68E+05 events/s (3) remove overhead, disable individual timers (so here the overhead is 0) CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp INFO: COUNTERS overhead : 0.0333s for 1M start/stop cycles [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 4.1897s [COUNTERS] PROGRAM COUNTEROVERHEAD : 0.3330s ------------------------------------------------------------- [COUNTERS] *** USING RDTSC-BASED TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 3.8567s CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp INFO: COUNTERS overhead : 0.0659s for 1M start/stop cycles [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 4.5119s [COUNTERS] PROGRAM COUNTEROVERHEAD : 0.6594s ------------------------------------------------------------- [COUNTERS] *** USING STD::CHRONO TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 3.8525s (4) do not remove overhead, disable individual timers (remove also the overhead from the estimation of the overhead) (this test was done on another day on the same machine and build, but the results are compatible with the previous ones) CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] *** USING RDTSC-BASED TIMERS (do not remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 3.8072s CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] *** USING STD::CHRONO TIMERS (do not remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 3.8214s
…r merging git checkout upstream/master $(git ls-tree --name-only upstream/master */CODEGEN*txt)
…Source/makefile madgraph5#980) into prof (Checked that regenerating gg_tt.mad is all ok)
…r merging git checkout upstream/master $(git ls-tree --name-only upstream/master */CODEGEN*txt)
…Source/makefile madgraph5#980) into grid
…er merging git checkout upstream/master $(git ls-tree --name-only upstream/master */CODEGEN*txt)
…adgraph5#980) into cmsdy Fix conflicts: - epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common (remove Source/makefile) - epochX/cudacpp/CODEGEN/allGenerateAndCompare.sh (add processes from both branches) (Checked that regenerating gg_tt.mad is ok)
…ier merging git checkout upstream/master $(git ls-tree --name-only HEAD tput/logs* tmad/logs*)
…nerated code except gg_tt.mad for easier merging git checkout upstream/master $(git ls-tree --name-only upstream/master *.mad/SubProcesses/P*/auto_dsig1.f | grep -v ^gg_tt.mad)
…dhel, for360) into prof Fix conflicts: - epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f (use upstream/master, will add back all counters as in prof) - epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 (use upstream/master, will regenerate this) - epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common (use upstream/master, will regenerate this)
…f branch before merging upstream/master (fix conflicts)
…pstream/master including june24, goodhel, for360 The only files that still need to be patched are - 2 in patch.common: Source/dsample.f, SubProcesses/makefile - 4 in patch.P1: auto_dsig1.f, auto_dsig.f, driver.f, matrix1.f Note: this is 3 files more than those needed in upstream/master (added Source/dsample.f, auto_dsig1.f, auto_dsig.f) ./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/SubProcesses/makefile gg_tt.mad/Source/dsample.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad (Later checked that gg_tt.mad can be regenerated ok)
…' (including june24, goodhel, for360) into prof Also add to the repo a few missing files in gux_taptamggux.mad and nobm_pp_ttW.mad
…ging git checkout upstream/master $(git ls-tree --name-only upstream/master */CODEGEN*txt)
…ated code except gg_tt.mad for easier merging git checkout upstream/master $(git ls-tree --name-only upstream/master *.mad/Source/dsample.f | grep -v ^gg_tt.mad)
…also amd and v1.00.01 fixes) into prof Fix conflicts (use upstream/master version): epochX/cudacpp/gg_tt.mad/Source/dsample.f Will then regenerate patches from this gg_tt.mad
…/master including v1.00.00 and also amd and v1.00.01 fixes The only files that still need to be patched are - 2 in patch.common: Source/dsample.f, SubProcesses/makefile - 4 in patch.P1: auto_dsig1.f, auto_dsig.f, driver.f, matrix1.f Note: this is 3 files more than those needed in upstream/master (added Source/dsample.f, auto_dsig1.f, auto_dsig.f) ./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/SubProcesses/makefile gg_tt.mad/Source/dsample.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad (Later checked that regenerating gg_tt.mad gives no change)
…and also amd and v1.00.01 fixes)
… v1.00.00 and with AMD and v1.00.01 fixes) into cmsdy Fix conflicts: - epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 (manual attempt, will regenerate anyway) - epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common (manual attempt, will regenerate anyway) - epochX/cudacpp/CODEGEN/recreateRefs.sh (use profs version)
…est prof (with upstream/master v1.00.00 and AMD/v1.00.01 fixes) into cmsdy The only files that still need to be patched are - 2 in patch.common: Source/dsample.f, SubProcesses/makefile - 4 in patch.P1: auto_dsig1.f, auto_dsig.f, driver.f, matrix1.f Note: this is 3 files more than those needed in upstream/master (added Source/dsample.f, auto_dsig1.f, auto_dsig.f) ./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/SubProcesses/makefile gg_tt.mad/Source/dsample.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a WIP PR with various studies on CMS Drell Yan, addressing various issues