Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vlasiator gpu #1074

Open
wants to merge 1,087 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1087 commits
Select commit Hold shift + click to select a range
5324780
Fix to gpu moments with empty cells
markusbattarbee Apr 23, 2024
1862a7b
Reduce block padding
markusbattarbee Apr 25, 2024
10ad0cc
Merge branch 'dev' into gpu_loop_blockadjust_uptodatedev
markusbattarbee Apr 25, 2024
f06e2fb
Cleanup
markusbattarbee Apr 25, 2024
2c18d37
Implement spatial cell class moving PR in GPU branch as well
markusbattarbee Apr 26, 2024
995fe00
Merge branch 'dev' into gpu_loop_blockadjust_uptodatedev
markusbattarbee Apr 26, 2024
056b773
revert some accidental cleanup from spatial_cell_cpu
markusbattarbee Apr 26, 2024
69136cf
Fix one testpackage script, revert some pointless changes
markusbattarbee Apr 26, 2024
818fafd
Fix passage of bulk velocities to arch second moments.
markusbattarbee Apr 29, 2024
b9c9f65
re-implement block adjustment without deletion
markusbattarbee Apr 30, 2024
82dddf9
Fix CI compilations
markusbattarbee Apr 30, 2024
4d99980
more fixes to CI
markusbattarbee Apr 30, 2024
8b08b9f
Fix ionosphere mini-app compilation and squelch some warnings
markusbattarbee Apr 30, 2024
e006221
More verbose logging in ioread.cpp
markusbattarbee Apr 30, 2024
4644696
update hashinator commit
markusbattarbee May 2, 2024
991f3a0
Fix shared memory access issue in block content evaluation
markusbattarbee May 2, 2024
286c9bc
Merge branch 'vlasiator_gpu' into gpu_loop_blockadjust_uptodatedev
markusbattarbee May 3, 2024
ab43396
Merge pull request #958 from markusbattarbee/gpu_loop_blockadjust_upt…
markusbattarbee May 3, 2024
e953461
debugging crashes in remote neighbour contribution after LB
markusbattarbee May 7, 2024
2a2a526
Got rid of map_add completely, instead extracting to-be-added GIDs fr…
markusbattarbee May 8, 2024
f61c0fd
Remove references to map_add also in gpu_acc_map
markusbattarbee May 8, 2024
0a36ba8
Merge pull request #962 from markusbattarbee/gpu_remove_map_add
markusbattarbee May 8, 2024
d44a162
fix debug compilation and lumi pointless warnings
markusbattarbee May 9, 2024
f361082
Merge branch 'vlasiator_gpu' into gpu_debug_mpi
markusbattarbee May 13, 2024
5a7d147
Fixed GPU MPI crash
markusbattarbee May 13, 2024
f50022c
Merge pull request #965 from markusbattarbee/gpu_debug_mpi
markusbattarbee May 13, 2024
2660e67
Merge branch 'dev' into vlasiator_gpu
markusbattarbee May 13, 2024
fd0d6f3
Printouts and changes to get a larger buffer for profiling the transl…
markusbattarbee May 17, 2024
d33ce6f
Added multiplier for main vlasov buffer for translation. fixed transl…
markusbattarbee May 17, 2024
6079f40
Add OMP support for host backend
hokkanen May 21, 2024
bda1442
Fix copysphere issue with uninitialized memory
markusbattarbee May 22, 2024
97c9eae
Allow building CPU version on GPU makefiles by turning off USE_GPU en…
markusbattarbee May 22, 2024
5f4e3d1
Merge branch 'vlasiator_gpu' into vlasiator_gpu_profile_trans_kernel
markusbattarbee May 23, 2024
a697b1a
re-fix broken ifdefs for CPU vs GPU compilation
markusbattarbee May 24, 2024
0af6cdf
Streamlined translation, getting rid of needless transposition. Helps…
markusbattarbee May 24, 2024
dbe6b31
Merge branch 'gpu_trans_buffers_fixes' into vlasiator_gpu_profile_tra…
markusbattarbee May 24, 2024
6dbad6a
Merge branch 'dev' into vlasiator_gpu_profile_trans_kernel
markusbattarbee May 24, 2024
7b57cc3
Merge branch 'arch_omp_support' into vlasiator_gpu_trans_archOpenMP
markusbattarbee May 24, 2024
959a6ac
Merge pull request #972 from markusbattarbee/vlasiator_gpu_profile_tr…
markusbattarbee May 24, 2024
e95ccb7
Merge branch 'vlasiator_gpu' of https://github.com/fmihpc/vlasiator i…
hokkanen May 27, 2024
cc41e60
Port a few more reducers to arch
hokkanen May 27, 2024
9110a89
Update Hashinator interface to use templated prefetch deactivations
markusbattarbee May 28, 2024
875a798
Updated Hashinator version to v1.0.1 released on 30.5.2024
markusbattarbee May 30, 2024
cc78189
Move to fmihpc hashinator repo, remove a few compile warnings
markusbattarbee May 30, 2024
2d82aac
Merge pull request #979 from fmihpc/vlasiator_gpu_hashinator_update
markusbattarbee May 30, 2024
0140c21
Removed block offset evaluation kernel in ACC; included funcitonality…
markusbattarbee Jun 3, 2024
e7c1685
Merge branch 'vlasiator_gpu' into vlasiator_gpu_trans_archOpenMP
markusbattarbee Jun 3, 2024
a9c8714
Merge branch 'arch_omp_support' into vlasiator_gpu_trans_archOpenMP
markusbattarbee Jun 3, 2024
a1d783f
Merge branch 'vlasiator_gpu' of https://github.com/fmihpc/vlasiator i…
hokkanen Jun 3, 2024
c8821f2
Fix CUDA reducer issue
hokkanen Jun 3, 2024
262b3af
Merge pull request #980 from markusbattarbee/gpu_no_block_offsets_kernel
markusbattarbee Jun 3, 2024
115fe27
Merge branch 'vlasiator_gpu' into vlasiator_gpu_trans_archOpenMP
markusbattarbee Jun 3, 2024
34861d0
Data reducer cleanups, pass device VBC pointer if using GPU code
markusbattarbee Jun 3, 2024
69ab044
Purge min/max distribution function value diagnostic reducers
markusbattarbee Jun 3, 2024
b30d8e7
Merge pull request #969 from hokkanen/arch_omp_support
markusbattarbee Jun 3, 2024
78a27bc
Merge branch 'amr-test' into vlasiator_gpu_trans_archOpenMP
markusbattarbee Jun 3, 2024
33a5e7d
Merge branch 'vlasiator_gpu' into vlasiator_gpu_trans_archOpenMP
markusbattarbee Jun 3, 2024
e6318c7
Fixing dAMR in gpu branch, both cpu and gpu versions
markusbattarbee Jun 3, 2024
aa33958
Merge branch 'dev' into vlasiator_gpu_fix_damr
markusbattarbee Jun 3, 2024
236741f
Merge pull request #981 from markusbattarbee/vlasiator_gpu_fix_damr
markusbattarbee Jun 4, 2024
e700da3
Purged config references to the GPUBLOCKS / CUDABLOCKS parameter and …
markusbattarbee Jun 4, 2024
afba532
Merge pull request #983 from markusbattarbee/gpu_purge_gpublocks
markusbattarbee Jun 4, 2024
f843e38
Converted translation kernel to use a single launch instead of one pe…
markusbattarbee Jun 5, 2024
03260e6
Fix translation buffer allocation call and kernel launch grid
markusbattarbee Jun 5, 2024
509c577
Revert mahti_cuda makefile settings
markusbattarbee Jun 5, 2024
5d5e5ae
Move batch calls to update blocks into separate gpu/cpu files
markusbattarbee Jun 6, 2024
dcbecdb
Implement block content evaluation kernel as batch operation. Clears …
markusbattarbee Jun 10, 2024
8425589
Hash map resetting done via batch kernel
markusbattarbee Jun 10, 2024
43eac5a
Batch loop reduction of content blocks from map to list
markusbattarbee Jun 11, 2024
11f4f13
Moved batch operation kernels into header file, returned to using aut…
markusbattarbee Jun 11, 2024
d39cb77
got rule to work
markusbattarbee Jun 11, 2024
af2d059
added missing clearing of temp buffers in block update
markusbattarbee Jun 11, 2024
aa6f4a8
Changed order of maps passed to batch kernels in preparation for more…
markusbattarbee Jun 11, 2024
abd4bbd
Moved also block adjustment calls to batch files
markusbattarbee Jun 12, 2024
e3f4882
fix cpu compilation of batch file
markusbattarbee Jun 12, 2024
d623faa
Chopped up block adjustment on GPU into separate loops
markusbattarbee Jun 12, 2024
62bed74
Moved to using per-cell block adjustment lists. Some memory footprint…
markusbattarbee Jun 12, 2024
3d0c56a
Ported velocity halo addition to batch mode
markusbattarbee Jun 12, 2024
ee0143a
actually turn off the single-cell v-halo update when running batch mode
markusbattarbee Jun 12, 2024
eb399a0
Moved neighbour halo handling to batch mode
markusbattarbee Jun 12, 2024
0827cf7
Moved first loop reduction to batch mode
markusbattarbee Jun 12, 2024
547f250
Move to re-using allocations for batch block adjustment
markusbattarbee Jun 13, 2024
2e34ef1
move some e.g. reservation handling around in gpu batch operations
markusbattarbee Jun 13, 2024
f24acea
Fix to caller with confusion between doDeleteEmpty and accountForNeig…
markusbattarbee Jun 13, 2024
299be6a
Fixes to neighbour management in batch operations
markusbattarbee Jun 13, 2024
1283c95
Pass extra values to extraction wrapper and kernel in preparation for…
markusbattarbee Jun 14, 2024
b52bd04
Converted three more extraction kernels to batch mode. Trial of launc…
markusbattarbee Jun 14, 2024
15dd384
revert to old stream approach
markusbattarbee Jun 14, 2024
01c6d3b
put size threshold reduction checks behind constexpr check
markusbattarbee Jun 14, 2024
98deb7e
Merge branch 'dev' into is-dt-changed
markusbattarbee Jun 18, 2024
1eeff59
Merge branch 'is-dt-changed' into vlasiator_gpu_damr
markusbattarbee Jun 18, 2024
3705c3e
remove trailing whitespace
markusbattarbee Jun 18, 2024
6b6bf27
Merge branch 'vlasiator_gpu_damr' into gpu_bulk_contentblocks
markusbattarbee Jun 18, 2024
709ed2a
First attempt at moving actual add and delete blocks as well as resiz…
markusbattarbee Jun 18, 2024
d0a52e2
Revert "put size threshold reduction checks behind constexpr check"
markusbattarbee Jun 18, 2024
f65f972
Merge branch 'dev' into gpu_bulk_contentblocks
markusbattarbee Jun 18, 2024
f1bc34d
Some clarifying renames, and adding early exits or skips as needed
markusbattarbee Jun 18, 2024
d4972c6
Merge branch 'vlasiator_gpu_trans_archOpenMP' into gpu_larger_kernel_…
markusbattarbee Jun 18, 2024
80fd114
Attempt making use of warp accessors sit behind an ifdef. Does not ye…
markusbattarbee Jun 19, 2024
3c2bdd2
put warp accessor use behind an ifdef. Non-WA methods break somewhere.
markusbattarbee Jun 19, 2024
1fbe6e5
rename spatial_cell_kernels to spatial_batch_kernels
markusbattarbee Jun 19, 2024
9bf9525
First run-through with some single-thread accessors successful
markusbattarbee Jun 25, 2024
c7fc2b3
Switch to already existing get_tombstone() and get_emptybucket()
markusbattarbee Jun 25, 2024
7d8e609
fixes to going without warpaccessors
markusbattarbee Jun 25, 2024
724d7cc
More fixes to when not using warpaccessors
markusbattarbee Jun 25, 2024
b39f037
Further fixes to neighbours and warpaccessors, seems to pass TP now
markusbattarbee Jun 26, 2024
8acc4d2
more parallel launch for union of blocks kernel
markusbattarbee Jun 26, 2024
5663928
Merge branch 'gpu_bulk_contentblocks' into gpu_optional_warp_accessors
markusbattarbee Jun 26, 2024
678e6cf
Cleanup of warp accessor flag use, now not applied in vmesh
markusbattarbee Jun 26, 2024
e70978c
Consolidated warp accessor toggle in translation
markusbattarbee Jun 26, 2024
90af95a
Reduce Translation buffer allocation factor (was causing swapping of …
markusbattarbee Jul 2, 2024
f571262
Reduce Translation buffer allocation factor (was causing swapping of …
markusbattarbee Jul 2, 2024
80a6ee1
Merge branch 'dev' into gpu_single_trans_kernel_launch_dev
markusbattarbee Aug 23, 2024
57536a9
Merge pull request #1021 from markusbattarbee/gpu_single_trans_kernel…
markusbattarbee Aug 23, 2024
cf7b8cf
Update LUMI-G hipcc makefile for system 24.03
markusbattarbee Oct 2, 2024
858b108
Update also lumi CPU project 358 makefile to 24.03
markusbattarbee Oct 2, 2024
672263b
Merge branch 'vlasiator_gpu_trans_archOpenMP' into vlasiator_gpu
markusbattarbee Oct 2, 2024
069ae18
update comments, ceil int cleanup
markusbattarbee Oct 2, 2024
015d702
Merge branch 'vlasiator_gpu' into gpu_optional_warp_accessors_better_2
markusbattarbee Oct 2, 2024
5161e0b
Better launch parameters for halo kernels
markusbattarbee Oct 4, 2024
1375d7a
Optional early loop exit in content calculations
markusbattarbee Oct 4, 2024
6d2b8c3
Update mass loss in batch block adjustment
markusbattarbee Oct 4, 2024
1c5ccfb
Move vmesh hash map cleaning into batch operations. Requires hashinat…
markusbattarbee Oct 4, 2024
b7d5fab
back to WID4 default
markusbattarbee Oct 4, 2024
7593bfe
Fix re-insertion of overflown elements launch parameters
markusbattarbee Oct 7, 2024
7708072
Actually fix re-insertion launch parameters. fix issue in communicati…
markusbattarbee Oct 7, 2024
be9c003
Make overflow count memcpy async after all.
markusbattarbee Oct 7, 2024
fa902ad
moments zeroing memset in separate API call
markusbattarbee Oct 7, 2024
4458f66
more parallelism for batch moment kernels
markusbattarbee Oct 7, 2024
45ba25b
Fix max content value evaluation for acc substepping
markusbattarbee Oct 7, 2024
371da27
Debug and cached size fixes
markusbattarbee Oct 18, 2024
368aaa8
Rework population replacement to not page fault, switch to pinned hos…
markusbattarbee Oct 18, 2024
df2ec3f
Ensure more space in hashmaps, and fix uninitialized memory in batch …
markusbattarbee Oct 18, 2024
8782727
Switch around batch operation launch parameters: max blocks cannot be…
markusbattarbee Oct 20, 2024
bbddb58
Adjust some launch parameters
markusbattarbee Oct 21, 2024
3a249c8
Reworking initialization. Loops are still CPU loops, but ARCH loops s…
markusbattarbee Oct 24, 2024
3ca4eca
Fix? to Flowthrough init.
markusbattarbee Oct 24, 2024
cdf3d9a
Move project initializing loops to ARCH-interface
markusbattarbee Oct 24, 2024
a35655d
Add new null operator to arch reductions. Not in use yet.
markusbattarbee Oct 24, 2024
b798418
Activate arch::null method in all project inits
markusbattarbee Oct 25, 2024
d4d9415
Larger initial allocations to reduce re-allocations during grid setup
markusbattarbee Oct 25, 2024
cc3caa3
Simplify call syntax to filling phase space for all projects
markusbattarbee Oct 25, 2024
391f5c0
attempt at utilizing velocity block container cached sizes and capaci…
markusbattarbee Oct 25, 2024
1b7e6eb
add buffer pointer update
markusbattarbee Oct 26, 2024
5ab717f
Fix TriAxisSearch missing set insert, some cleanup of unused commente…
markusbattarbee Oct 27, 2024
7b55b44
Makefile change revert, optimize single multipeak peak
markusbattarbee Oct 28, 2024
8862ad4
implement some gpuFreeAsync calls, fix types (GID/LID buffers)
markusbattarbee Oct 28, 2024
962be59
Switch over both vmesh and vbc to use regular allocations for splitve…
markusbattarbee Oct 28, 2024
70f1be3
New reworking of initialization, again. vmesh and vbc resizes still a…
markusbattarbee Oct 28, 2024
62ed1c5
turn off debug check of empty gtl map for now
markusbattarbee Oct 28, 2024
e132f42
squelch compiler warning about getPerBVol
markusbattarbee Oct 28, 2024
c4a074a
Larger initial allocations and reservations, allowing for faster init…
markusbattarbee Oct 29, 2024
e6db7f9
work towards removing page faults during IO of VDFs
markusbattarbee Oct 29, 2024
a9f85ea
Fix CPU compilation
markusbattarbee Oct 29, 2024
403b6d5
Further iowrite GPU optimization
markusbattarbee Oct 30, 2024
6449a82
Make maxwellian sysboundaries init on-device
markusbattarbee Oct 30, 2024
3f207f4
Better memory management and better setSizeClear
markusbattarbee Oct 30, 2024
0da547b
adjust vector uploads and prefetches
markusbattarbee Oct 30, 2024
781b180
Add forgotten spatial_cell_kernels.hpp file
markusbattarbee Oct 30, 2024
830b609
Better place for reservation application for maxwellian sysb cells du…
markusbattarbee Oct 30, 2024
3b476a2
Fix WID3 error in iowrite buffers
markusbattarbee Oct 31, 2024
e24a096
Move Maxwellian and TriMaxwellian distribution calls to projects.h ou…
markusbattarbee Oct 31, 2024
33b74a3
Move findMaxwellanBlocksToInit to sysboundarycondition.cpp
markusbattarbee Oct 31, 2024
5eeeccb
Better host-device-managing of translation and dt containers
markusbattarbee Oct 31, 2024
0810eb2
also gpu_dt.cpp
markusbattarbee Oct 31, 2024
b759e4f
extract emptybucket value in-kernel
markusbattarbee Oct 31, 2024
cce90b3
initialize sysboundary objects on host memory
markusbattarbee Oct 31, 2024
55952f9
better storage of capacity for union of blocks set
markusbattarbee Oct 31, 2024
c14cf13
Move auxiliary block lists for all cells from unified to host/device …
markusbattarbee Oct 31, 2024
c56f351
use stored capacity values in clearing of hashmaps in velocity mesh
markusbattarbee Oct 31, 2024
8ea3182
Better sysboundary-related phiprofs
markusbattarbee Nov 1, 2024
de59dd0
Transition GPU trans pencils to use pure GPU buffers, switch some pur…
markusbattarbee Nov 1, 2024
ac9e766
Move pretching into cell construction, remove some stream synchroniza…
markusbattarbee Nov 1, 2024
a0b8834
Fix some initialization velocity evaluation typos
markusbattarbee Nov 1, 2024
f50c3e4
Migrate ionosphere and copysphere to new GPU initialization. NOTE: io…
markusbattarbee Nov 1, 2024
0b8e923
Fix compile warnings on LUMI
markusbattarbee Nov 1, 2024
e83bee2
fix numbering in testpackage tests lists
markusbattarbee Nov 4, 2024
48fb404
Fix cpu compilation
markusbattarbee Dec 2, 2024
3d6ede1
Added Leonardo makefiles (booster tested, dcgp compiles but not tested)
markusbattarbee Dec 2, 2024
ef0daf9
Merge branch 'dev' into arch_init_dev
markusbattarbee Dec 2, 2024
6890247
Trying to get Leonardo DCGP Intel OneAPI compiler to work
markusbattarbee Dec 4, 2024
a702db5
Almost merged dev into arch_init: CPU compiles, GPU fails in projects…
markusbattarbee Dec 5, 2024
1b6aad1
Fix CUDA compilation in projects/project.cpp
markusbattarbee Dec 5, 2024
f674d2e
Fixes to ordering of projects::MaxwellianPhaseSpaceDensity arguments
markusbattarbee Dec 11, 2024
b825b43
Minor cleanup
markusbattarbee Dec 12, 2024
1f03ea4
Fix substepping tests. Remove GPU destructors now that we have destru…
markusbattarbee Dec 12, 2024
bc22015
Add missing cache value update to ResizeClear, elaborate on some debu…
markusbattarbee Dec 13, 2024
38eb711
Deprecate use of Realv, reduce type conversions inside Vlasov solver …
markusbattarbee Dec 13, 2024
b51a6db
Compression of debug checks for spatial cells
markusbattarbee Dec 16, 2024
8d09511
Further debug cleanup, fix mistaken cached size setting in copy assign
markusbattarbee Dec 16, 2024
815c0fd
Merge branch 'arch_init' into arch_init_references_in_vlasovsolver
markusbattarbee Dec 16, 2024
40e04dd
Merge branch 'arch_init' into arch_init_dev
markusbattarbee Dec 16, 2024
cde20b0
fix compilation with passing of dt and intersection values
markusbattarbee Dec 16, 2024
49b02ca
Fixes to memory clearings. Found that sysBoundaryLayerNew is somehow …
markusbattarbee Dec 16, 2024
3d34c86
Fix error in iowrite buffer offset movement, improve ioread gpu buffe…
markusbattarbee Dec 17, 2024
024ee5a
Update instructions for good core thread placement on KAROLINA
markusbattarbee Dec 17, 2024
3c48938
Karolina cuda makefile rename
markusbattarbee Dec 17, 2024
76353dd
Fix some clearing and reallocation & caching issues
markusbattarbee Dec 18, 2024
2701ff4
Switch VBC and vmesh accessors to use pass-by-value instead of pass-b…
markusbattarbee Dec 18, 2024
e91482a
Fix allocation call short-circuit evaluation and some CPU compilation
markusbattarbee Dec 19, 2024
fc46311
Merge branch 'arch_init' into arch_init_dev
markusbattarbee Dec 19, 2024
3633b53
Merge branch 'vlasiator_gpu' into gpu_leonardo_intel
markusbattarbee Dec 19, 2024
bbdaa87
Compilation help for Intel OneAPI on Leonardo from their helpdesk. Th…
markusbattarbee Dec 19, 2024
cd93744
fix numbering in testpackage tests lists
markusbattarbee Nov 4, 2024
d1c6b39
Fixes to ordering of projects::MaxwellianPhaseSpaceDensity arguments
markusbattarbee Dec 11, 2024
22fd35e
Minor cleanup
markusbattarbee Dec 12, 2024
b4bd600
Fix substepping tests. Remove GPU destructors now that we have destru…
markusbattarbee Dec 12, 2024
5f5e231
Add missing cache value update to ResizeClear, elaborate on some debu…
markusbattarbee Dec 13, 2024
fd342f3
Compression of debug checks for spatial cells
markusbattarbee Dec 16, 2024
269933e
Further debug cleanup, fix mistaken cached size setting in copy assign
markusbattarbee Dec 16, 2024
2d9e256
Fixes to memory clearings. Found that sysBoundaryLayerNew is somehow …
markusbattarbee Dec 16, 2024
f833326
Fix error in iowrite buffer offset movement, improve ioread gpu buffe…
markusbattarbee Dec 17, 2024
a0c147a
Fix some clearing and reallocation & caching issues
markusbattarbee Dec 18, 2024
b184a28
Switch VBC and vmesh accessors to use pass-by-value instead of pass-b…
markusbattarbee Dec 18, 2024
0f07539
Fix allocation call short-circuit evaluation and some CPU compilation
markusbattarbee Dec 19, 2024
9a3f6c8
First attempt at GPU memory reporting
markusbattarbee Dec 20, 2024
3550d2e
Better memory reporting, again smaller default allocations
markusbattarbee Jan 8, 2025
5dab351
implement mass loss correction on-device
markusbattarbee Jan 8, 2025
f61f090
Fix some buffer types, add batch mass loss correction
markusbattarbee Jan 8, 2025
f81279c
Keep track of largest attained velocity mesh size for each cell (for …
markusbattarbee Jan 9, 2025
63bd99d
Added missing allocation verification, some debug improvements
markusbattarbee Jan 10, 2025
cf46458
Merge branch 'arch_init' into arch_init_references_in_vlasovsolver
markusbattarbee Jan 10, 2025
082b645
turn off debug
markusbattarbee Jan 10, 2025
20d6869
Merge branch 'arch_init_memreport' into arch_init_mem_references_in_v…
markusbattarbee Jan 10, 2025
9263550
Merge branch 'arch_init_mem_references_in_vlasovsolver' into arch_ini…
markusbattarbee Jan 13, 2025
5a1054d
submodule update
markusbattarbee Jan 13, 2025
a55d645
Debugging and cleaning up
markusbattarbee Jan 14, 2025
ba66564
Merge branch 'vlasiator_gpu' into arch_init_dev
markusbattarbee Jan 14, 2025
fb2404e
LUMI compile fixes, update hashinator subcommit
markusbattarbee Jan 16, 2025
9e2e3e2
Makefile for Ukko's GPUs
ursg Jan 16, 2025
7503a24
Add CI stage to build testpackage on ukko
ursg Jan 16, 2025
5e93d7f
Add CI testpackage run step for GPUs on Ukko
ursg Jan 16, 2025
e10b9a3
Clean up ukko_cuda makefile a bit.
ursg Jan 16, 2025
ab99c17
Clean up ukko testpackage run script a bit.
ursg Jan 16, 2025
5bd1534
Merge pull request #19 from ursg/gpu_ukko_CI
markusbattarbee Jan 16, 2025
f4b5b88
Merge branch 'dev' into vlasiator_gpu_mergedev
markusbattarbee Jan 16, 2025
8fbe9d2
fix conflicts
markusbattarbee Jan 16, 2025
21d40db
Try to fix GPU CI yaml
markusbattarbee Jan 16, 2025
eb26fcf
Fix job and file namings in GPU ci yaml
markusbattarbee Jan 16, 2025
7b6277b
Fix vlsv building arch
markusbattarbee Jan 16, 2025
f5aec77
vlsv building serial
markusbattarbee Jan 16, 2025
9b9abb6
Try to fix Ukko GPU
markusbattarbee Jan 17, 2025
906536d
Makefile tab fix, try to get CI running
markusbattarbee Jan 17, 2025
a776764
Merge branch 'dev' into vlasiator_gpu
markusbattarbee Jan 17, 2025
e3b4edb
actually use correct ukko-cuda modules in CI runner
markusbattarbee Jan 17, 2025
43b54c9
Merge branch 'vlasiator_gpu' of github.com:markusbattarbee/vlasiator-…
markusbattarbee Jan 17, 2025
23168b5
update TP and linking on ukko-cuda to use c++20
markusbattarbee Jan 17, 2025
ef15479
Fix build libraries script for jemalloc on risc-v, ukko-gpu testpacka…
markusbattarbee Jan 17, 2025
33df293
Fix arch_host c++ type system details to be compatible with c++20
ursg Jan 17, 2025
6e317ca
Merge pull request #20 from ursg/arch_c++20
markusbattarbee Jan 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 102 additions & 1 deletion .github/workflows/github-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,6 @@ jobs:
path: vlasiator
if-no-files-found: error


build_testpackage:
# Build Vlasiator with testpackage flags, on the carrington cluster
# (for subsequent running of the integration test package)
Expand Down Expand Up @@ -228,6 +227,46 @@ jobs:
# name: Testpackage build log
# path: build.log

build_testpackage_ukkoGPU:
# Build Vlasiator with testpackage flags, on the ukko cluster using
# it's nvidia gpus
runs-on: carrington

steps:
- name: Clean workspace
run: |
RUN_STRING=$( cat << MORO
rm -rf libraries library-build testpackage
rm -f libraries.tar.zst testpackage_check_description.txt testpackage-output.tar.gz metrics.txt stdout.txt stderr.txt testpackage_output_variables.txt
rm -f *.xml
MORO
)
srun -M ukko bash -c "$RUN_STRING"
- name: Checkout source
uses: actions/checkout@v4
with:
submodules: true
- name: Make clean
run: VLASIATOR_ARCH=ukko_cuda make clean
- uses: ursg/gcc-problem-matcher@master
- name: Compile vlasiator (Testpackage build w/ CUDA)
run: |
export VLASIATOR_ARCH=ukko_cuda
srun -Mukko -pgpu-oversub --cpus-per-gpu=8 --mem-per-gpu=20G --job-name CI_tp_compile --interactive --nodes=1 -n 1 -c 16 -t 1:00:0 bash -c 'module purge; ml GCC/11.2.0; ml OpenMPI/4.1.1-GCC-11.2.0; ml PMIx/4.1.0-GCCcore-11.2.0; ml PAPI/6.0.0.1-GCCcore-11.2.0; ml CUDA; ml Boost/1.55.0-GCC-11.2.0; export VLASIATOR_ARCH=ukko_cuda; make -j 9 testpackage; sleep 10s'
- name: Make sure the output binary is visible in lustre
uses: nick-fields/retry@v3
with:
timeout_seconds: 15
max_attempts: 3
retry_on: error
command: ls vlasiator
- name: Upload testpackage binary
uses: actions/upload-artifact@v4
with:
name: vlasiator-testpackage-gpu
path: vlasiator
if-no-files-found: error

build_riscv:
runs-on: risc-v
needs: build_libraries_riscv
Expand Down Expand Up @@ -394,6 +433,68 @@ jobs:
# Note: Testpackage output is further processed in the pr_report.yml workflow
# (to produce Checks against pull requests)

run_testpackage_gpu:
# Run the testpackage on the carrington cluster
runs-on: carrington
needs: [build_testpackage_ukkoGPU, build_tools]
continue-on-error: true

steps:
- name: Checkout source
uses: actions/checkout@v4
with:
submodules: false
- name: Download testpackage binary
uses: actions/download-artifact@v4
with:
name: vlasiator-testpackage-gpu
- name: Download tools
uses: actions/download-artifact@v4
with:
name: vlasiator-tools
- name: Run testpackage
id: run
run: |
chmod +x $GITHUB_WORKSPACE/vlasiator
chmod +x $GITHUB_WORKSPACE/vlsv*_DP
cd testpackage
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$GITHUB_WORKSPACE/libraries/lib
sbatch -W -o testpackage_run_output.txt ./small_test_ukko_gpu_github_ci.sh
PARSE_OUTPUT_CMD=$( cat << MORO
echo "Job finished, checking output."
cat testpackage_run_output.txt
cat $GITHUB_STEP_SUMMARY > $GITHUB_WORKSPACE/testpackage_check_description.txt
cd $GITHUB_WORKSPACE
ls -halB testpackage_check_description.txt
tar -czf testpackage-output.tar.gz testpackage_check_description.txt testpackage_output_variables.txt
MORO
)
srun --job-name CI_package_results -M ukko -N 1 -c 1 --mem=1G bash -c "$PARSE_OUTPUT_CMD"
if [ -f $GITHUB_WORKSPACE/testpackage_failed ]; then
# Fail this step if any test failed.
exit 1
fi
- name: Scancel dangling job upon cancellation
if: cancelled()
run: |
# Try accessing the job id echoed by the job script.
scancel ${{ steps.run.outputs.SLURM_JOB_ID }}
- name: Make sure the output tarball is visible in lustre
uses: nick-fields/retry@v3
with:
timeout_seconds: 15
max_attempts: 3
retry_on: error
command: ls $GITHUB_WORKSPACE/testpackage-output-gpu.tar.gz
- name: Upload testpackage output
uses: actions/upload-artifact@v4
if: always()
with:
name: testpackage-output-gpu
path: testpackage-output-gpu.tar.gz
# Note: Testpackage output is further processed in the pr_report.yml workflow
# (to produce Checks against pull requests)

build_ionosphereTests:
# Build IonosphereSolverTests miniApp
runs-on: ubuntu-latest
Expand Down
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ diagnostic.txt
*.vlsv
*.silo
*.o
*.ll
*.gpu
*.ptx
*.s
*.d
vscode/
.vscode/
Expand Down
4 changes: 4 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,7 @@
[submodule "submodules/vectorclass-addon"]
path = submodules/vectorclass-addon
url = https://github.com/vectorclass/add-on

[submodule "submodules/hashinator"]
path = submodules/hashinator
url = https://github.com/fmihpc/hashinator.git
5 changes: 5 additions & 0 deletions MAKE/Makefile.Freezer
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ endif
# MATHFLAGS are for special math etc. flags, these are only applied on solver functions
# LDFLAGS flags for linker

# -march=native -fpermissive
CXXFLAGS += -O3 -fopenmp -funroll-loops -std=c++20 -W -Wall -Wno-unused -fabi-version=0 -mfma -mavx2 -Wno-unknown-pragmas -Wno-sign-compare
testpackage: CXXFLAGS = -g -ggdb -O2 -fopenmp -funroll-loops -std=c++20 -fabi-version=0 -mno-avx -mno-fma -fno-unsafe-math-optimizations

Expand Down Expand Up @@ -68,3 +69,7 @@ LIB_VLSV = -L$(LIBRARY_PREFIX)/vlsv -lvlsv -Wl,-rpath=$(LIBRARY_PREFIX)/vlsv/lib

LIB_PROFILE = -L$(LIBRARY_PREFIX)/phiprof/lib -lphiprof -Wl,-rpath=$(LIBRARY_PREFIX)/phiprof/lib
INC_PROFILE = -I$(LIBRARY_PREFIX)/phiprof/include

#enable nvtx on cpu mode as well
#LIB_PROFILE = -L$(LIBRARY_PREFIX)/phiprof_nvcc/lib -lphiprof -Wl,-rpath=$(LIBRARY_PREFIX)/phiprof_nvcc/lib
#INC_PROFILE = -I$(LIBRARY_PREFIX)/phiprof_nvcc/include
101 changes: 101 additions & 0 deletions MAKE/Makefile.Freezer_cuda
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# Markus' desktop computer, CUDA
# Can be used as a sample on how to generate local CUDA makefiles
#
# Note: CUDA versions before 11.6 will complain when compiling backgroundfields
# (error: parameter packs not expanded with ‘...’:)
# this is fixed by installing at least version 11.6

#======== Vectorization ==========
#Set vector backend type for vlasov solvers, sets precision and length.
#Options:
# AVX: VEC4D_AGNER, VEC4F_AGNER, VEC8F_AGNER
# AVX512: VEC8D_AGNER, VEC16F_AGNER
# Fallback: VECTORCLASS = VEC_FALLBACK_GENERIC (Defaults to VECL8)

ifeq ($(DISTRIBUTION_FP_PRECISION),SPF)
#Single-precision
VECTORCLASS = VEC_FALLBACK_GENERIC
else
#Double-precision
VECTORCLASS = VEC_FALLBACK_GENERIC
endif

#===== Vector Lengths ====
# Default for VEC_FALLBACK_GENERIC is WID=4, VECL=8
# NOTE: A bug currently results in garbage data already on cell init if VECL is not equal to WID2
#WID=8
#VECL=64
WID=4
VECL=16

#======= Compiler and compilation flags =========
# NOTES on compiler flags:
# CXXFLAGS is for compiler flags, they are always used
# MATHFLAGS are for special math etc. flags, these are only applied on solver functions
# LDFLAGS flags for linker

USE_CUDA=1

# Tell mpic++ to use nvcc for all compiling
CMP = OMPI_CXX='nvcc' OMPI_CXXFLAGS='' mpic++

# Now tell also the linker to use nvcc
# These are found with mpic++ --showme:link
# The line below indeed uses OMPI_CXX, not OMPI_LD
LNK = OMPI_CXX='nvcc' OMPI_CXXFLAGS='-arch=sm_60' OMPI_LIBS='-L/usr/lib/x86_64-linux-gnu/openmpi/lib' OMPI_LDFLAGS='-lmpi_cxx -lmpi' mpic++

#-G (device debug) overrides --generate-line-info -line-info
# but also requires more device-side resources to run
# use "-Xptxas -v" for verbose output of ptx compilation

# Geforce GTX 1060 6GB is compute version 61
# https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/

CXXFLAGS = -g -O3 -x cu -std=c++17 -Xcompiler -std=c++17 --extended-lambda --expt-relaxed-constexpr -gencode arch=compute_60,code=sm_60 -Xcompiler -fopenmp --generate-line-info -line-info -Xcompiler="-fpermissive" --extra-device-vectorization
testpackage: CXXFLAGS = -g -O2 -x cu -std=c++17 --extended-lambda --expt-relaxed-constexpr -gencode arch=compute_60,code=sm_60 -Xcompiler -fopenmp --generate-line-info -line-info -Xcompiler="-fpermissive"


MATHFLAGS = --use_fast_math
# nvcc fast_math does not assume only finite math
testpackage: MATHFLAGS = --prec-sqrt=true --prec-div=true --ftz=false --fmad=false

LDFLAGS = -O2 -g -lnvToolsExt
LIB_MPI = -lgomp

LIB_CUDA = -L/usr/local/cuda/lib64
INC_CUDA = -isystem /usr/local/cuda/include

#======== PAPI ==========
#Add PAPI_MEM define to use papi to report memory consumption?
#CXXFLAGS += -DPAPI_MEM
#testpackage: CXXFLAGS += -DPAPI_MEM

#======== Allocator =========
#jemalloc is CPU only

#======== Libraries ===========
LIBRARY_PREFIX = /home/markusb/git/vlasiator-lib

INC_BOOST = -isystem /usr/include/boost
LIB_BOOST = -L/usr/include/boost -lboost_program_options

INC_ZOLTAN = -isystem /usr/include/trilinos
LIB_ZOLTAN = -I/usr/lib/x86_64-linux-gnu -ltrilinos_zoltan

# INC_PAPI = -I$(LIBRARY_PREFIX)/papi/include
# LIB_PAPI = -I$(LIBRARY_PREFIX)/papi/lib -Wl,-rpath=$(LIBRARY_PREFIX)/papi/lib

INC_VLSV = -I$(LIBRARY_PREFIX)/vlsv
LIB_VLSV = -L$(LIBRARY_PREFIX)/vlsv -lvlsv -Xlinker=-rpath=$(LIBRARY_PREFIX)/vlsv/lib

LIB_PROFILE = -L$(LIBRARY_PREFIX)/phiprof/lib -lphiprof -Xlinker=-rpath=$(LIBRARY_PREFIX)/phiprof/lib
INC_PROFILE = -I$(LIBRARY_PREFIX)/phiprof/include

#======== Header-only Libraries ===========

INC_EIGEN = -isystem ./submodules/eigen
INC_DCCRG = -I./submodules/dccrg
INC_FSGRID = -I./submodules/fsgrid
INC_HASHINATOR = -isystem ./submodules/hashinator/
# Vectorclass only for CPU mode
# INC_VECTORCLASS = -I ./submodules/vectorclass/ -I ./submodules/vectorclass-addon/vector3d/
16 changes: 9 additions & 7 deletions MAKE/Makefile.appleM1
Original file line number Diff line number Diff line change
Expand Up @@ -7,20 +7,22 @@ LNK = mpic++
#Options:
# AVX: VEC4D_AGNER, VEC4F_AGNER, VEC8F_AGNER
# AVX512: VEC8D_AGNER, VEC16F_AGNER
# Fallback: VEC4D_FALLBACK, VEC4F_FALLBACK, VEC8F_FALLBACK
# AVX512: VEC8D_AGNER, VEC16F_AGNER
# Fallback: VEC_FALLBACK_GENERIC

ifeq ($(DISTRIBUTION_FP_PRECISION),SPF)
#Single-precision
#VECTORCLASS = VEC_FALLBACK_GENERIC
#VECTORCLASS = VEC8F_AGNER
VECTORCLASS = VEC8F_FALLBACK
VECTORCLASS = VEC_FALLBACK_GENERIC
else
#Double-precision
#VECTORCLASS = VEC4D_AGNER
#VECTORCLASS = VEC_FALLBACK_GENERIC
VECTORCLASS = VEC8D_FALLBACK
VECTORCLASS = VEC_FALLBACK_GENERIC
endif

#===== Vector Lengths ====
# Default for VEC_FALLBACK_GENERIC is WID=4, VECL=8
WID=4
VECL=8

#======== PAPI ==========
#Add PAPI_MEM define to use papi to report memory consumption?
#CXXFLAGS += -DPAPI_MEM # Papi does not work on MacOS, see https://stackoverflow.com/questions/69531604/installing-papi-on-macos
Expand Down
15 changes: 8 additions & 7 deletions MAKE/Makefile.arriesgado
Original file line number Diff line number Diff line change
Expand Up @@ -14,20 +14,21 @@ LNK = mpic++
#Options:
# AVX: VEC4D_AGNER, VEC4F_AGNER, VEC8F_AGNER
# AVX512: VEC8D_AGNER, VEC16F_AGNER
# Fallback: VEC4D_FALLBACK, VEC4F_FALLBACK, VEC8F_FALLBACK
# Fallback: VEC_FALLBACK_GENERIC

ifeq ($(DISTRIBUTION_FP_PRECISION),SPF)
#Single-precision
VECTORCLASS = VEC8F_FALLBACK
INC_VECTORCLASS = -I$(LIBRARY_PREFIX)/../vlasiator/vlasovsolver
VECTORCLASS = VEC_FALLBACK_GENERIC
else
#Double-precision
# VECTORCLASS = VEC4D_AGNER
# INC_VECTORCLASS = -I$(LIBRARY_PREFIX)/vectorclass
VECTORCLASS = VEC4D_FALLBACK
INC_VECTORCLASS = -I$(LIBRARY_PREFIX)/../vlasiator/vlasovsolver
VECTORCLASS = VEC_FALLBACK_GENERIC
endif

#===== Vector Lengths ====
# Default for VEC_FALLBACK_GENERIC is WID=4, VECL=8
WID=4
VECL=8

FLAGS =
# note: std was c++11
CXXFLAGS = -O1 -std=c++20 -W -Wall -pedantic -Wno-unused -Wno-unused-parameter -Wno-missing-braces -fopenmp -march=rv64imafdc -isystem /usr/lib/gcc/riscv64-linux-gnu/11/include/
Expand Down
2 changes: 1 addition & 1 deletion MAKE/Makefile.hawk_intel_mpt
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ FLAGS =
#GNU flags:
CC_BRAND = intel
CC_BRAND_VERSION = 19.1.0
# note: std was not updated to c++17
# note: std was not updated to c++20
CXXFLAGS += -traceback -g -O3 -qopenmp -std=c++14 -W -Wall -Wno-unused -march=core-avx2 -qopt-zmm-usage=high
testpackage: CXXFLAGS = -g -traceback -O2 -qopenmp -std=c++14 -W -Wno-unused -march=core-avx2
not_parallel_tools: CXXFLAGS += -march=native -mno-avx2 -mavx
Expand Down
Loading
Loading