-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite @threaded
macro to work with GPUs
#534
Conversation
For some reason it doesn't run on the CPU otherwise.
4a54721
to
7514ac5
Compare
@threaded
macro to work with GPUs@threaded
macro to work with GPUs
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #534 +/- ##
==========================================
+ Coverage 68.68% 69.07% +0.38%
==========================================
Files 76 75 -1
Lines 4548 4582 +34
==========================================
+ Hits 3124 3165 +41
+ Misses 1424 1417 -7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
This PR is ready now. I still haven't added docs for @lchristm It would be great if you could take a look at the macro stuff. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only reviewed the macro itself, not the rest of the code.
I think this is a very clever approach to add GPU support in a minimally invasive way, well done! However, as we already briefly discussed before, the current approach basically uses the code optimized for CPUs and adds the KernelAbstractions.jl boilerplate around it to run it on a GPU which may not be optimal performance-wise. So it would be interesting to compare the performance of this approach to a highly optimized GPU implementation sometime in the future (e.g. for a single kernel) to see if there is a performance gap and how large it is.
Excellent point! I think in the future we may actually go the other way around: Optimize our code for GPU use and keep 🤞 it will be still OK-ish for CPUs :-) |
We can still write explicit kernels for performance-relevant code. But the big question is: how do we optimize the code for the GPU? Through macros in the kernel signature? Then we need explicit kernels. Or maybe only through macros inside the kernel? Those might just work with the @threaded loops. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
* add .md and .bib files * add description of system interaction * revise [skip ci] * The grand reopening Part 2: Add overview page (#432) * implement * Revert "implement" This reverts commit 171505a. * some improvements to the documentation * some more fixes * add install documentation * add more info * update * Update get_started.md * add tutorial * fix * add stubs * update * include the example files * update * update * format * add more info to error message * fix path * add overview page * add diagram * add text * Update overview.md * add development page * refine text * [skip ci] * update * fix * fix * rm get_started * update URL of diagram * fix error * update * update * fix * update * update * rename * rename * update * change to img tags * remove minimal * update images * update * remove old stuff * fix images * fix * make wider * format * format * Update docs/src/development.md Co-authored-by: Erik Faulhaber <[email protected]> * review * rm css * add new picture * update time * move back --------- Co-authored-by: Erik Faulhaber <[email protected]> Co-authored-by: Niklas Neher <[email protected]> * add build workflow * add workflow name [skip ci] * test workflow * test workflow again * change event * fix paper.big bug * Bump crate-ci/typos from 1.18.2 to 1.19.0 (#480) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.18.2 to 1.19.0. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.18.2...v1.19.0) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix another closures error on macOS-ARM (#489) * test images storage * add images * add oss to .bib * fix bullet list * fix formatting * literature fix * modify image * add missing references * description: TP vs other OSS * remove table * update * Add wrapper for internal use of interpolation functions (#466) * add wrapper * fix tests * fix tests * implement suggestions * quick dirty fix --------- Co-authored-by: Sven Berger <[email protected]> * add beam validation * modify order * figure side by side * rework images * refs * image size * Reopen #382: Extrude shapes/geometry (#437) * implement `ExtrudeFace` * use `sample_face` in `interpolate_plane_3d` * add docs * add tests * fix bug * fix bug * add `tlsph=true` * fix tuple bug * fix typo * fix again * fix layers * calculate particle spacing differently * change `floor` to `ceil` * change `round` to `isapprox` * adapt docs * implement suggestions * add doctests * try to fix doctest * implement suggestions * implement suggestion * add svens summary suggestion * amend refs and minor stuff * Optimization bnd pressure computation (#475) * optimization * optimization * fix calc for 1 thread * format * update describtion * remove +1 * update reference files * update error bounds * fix * Update src/schemes/boundary/dummy_particles/dummy_particles.jl Co-authored-by: Erik Faulhaber <[email protected]> * Update src/schemes/boundary/dummy_particles/dummy_particles.jl Co-authored-by: Erik Faulhaber <[email protected]> * Update src/schemes/boundary/dummy_particles/dummy_particles.jl Co-authored-by: Erik Faulhaber <[email protected]> * Update src/schemes/boundary/dummy_particles/dummy_particles.jl Co-authored-by: Erik Faulhaber <[email protected]> * review --------- Co-authored-by: Erik Faulhaber <[email protected]> * Add `show` for `SolutionSavingCallback` with `save_times` (#485) * Add hacky `show` for solution saving callback with `save_times` * Unify comments * Add tests for saving callback show * Reformat * handle TODOs and revise text * rework * change image format * change image size * Remove TrixiParticles dependency in docs environment (#501) * Fix doc string of solution_saving.jl (#499) * fix doc string * missing comma * Update solution_saving.jl Fix indent --------- Co-authored-by: Erik Faulhaber <[email protected]> * implement suggestion * fix refs * apply italic * update * remove corrections * Make array types in structs generic (#486) * Make array types in structs generic * Use `Ref{Bool}` instead of `Vector{Bool}` * Make array types of EDAC system generic * Reformat * Fix tests * Make `DensityDiffusionAntuono` generic * Fix tests * Implement suggestions * Bump crate-ci/typos from 1.19.0 to 1.21.0 (#507) * Bump crate-ci/typos from 1.19.0 to 1.21.0 Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.19.0 to 1.21.0. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.19.0...v1.21.0) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * Fix/ignore typos --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Michael Schlottke-Lakemper <[email protected]> * Bump julia-actions/setup-julia from 1 to 2 (#506) Bumps [julia-actions/setup-julia](https://github.com/julia-actions/setup-julia) from 1 to 2. - [Release notes](https://github.com/julia-actions/setup-julia/releases) - [Commits](julia-actions/setup-julia@v1...v2) --- updated-dependencies: - dependency-name: julia-actions/setup-julia dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * The grand reopening Part 3: Add basic DEM (#433) * implement basic dem * fix * fix bnd * get basic simulation working * fix * format * fix spellcheck * add docs * update readme and news * format * add to examples run list * add basic test * ignore author name * remove * most of the review stuff * format * forgot to remove one thing * one more * rework interaction * format * update * typo * fix doctest * fix test * fix doc * fix tests again * fix doc and test * fix test * fix incorrect test * review * format * typo * review * review * format * realign line * fix test * update * Document proper release management (#508) * Mark Release V0.1.1 (#510) * Set to development version (#511) * Recursively convert (adapt) `Semidiscretization` object to GPU types with Adapt.jl (#492) * Adapt `Semidiscretization` before running the simulation * Modify constructors to be compatible with Adapt.jl * Only adapt the semidiscretization when using an explicit data type * Adapt `DensityDiffusionAntuono` * Adapt `BoundaryModelMonaghanKajtar` to GPUs * Add comments * Remove unnecessary types * CompatHelper: add new compat entry for Adapt at version 4, (keep existing compat) (#515) Co-authored-by: CompatHelper Julia <[email protected]> * Change compat of Adapt to 3 (#517) * Reopen #390: Update callback (#440) * add `UpdateCallback` * fix typo * apply formatter * remove update bool * adapt docstring * implement suggestions * remove redundant comment * first update after michaels review * adapt graphics * CompatHelper: bump compat for Adapt to 4, (keep existing compat) (#519) Co-authored-by: CompatHelper Julia <[email protected]> * embed pdf image * undo embed pdf * big rework * update * resize image * change image order * add blank line * fix typos * fix references * try to fix references * fix again * Create output directory for post process callback (#527) * Manually increase CairoMakie Version (#528) * Move neighborhood search to a new standalone package (#516) * Move neighborhood search to a new standalone package * Remove Morton.jl dependency * Remove tests for neighborhood search * Remove ThreadingUtilities.jl dependency * Fix interpolation * Add docs for neighborhood search * Dispatch `update_nhs!` instead of `nhs_coords` * Reformat code * Update NHS with coordinates of both systems * Initialize NHS with coordinates of both systems * Fix broken functions * Rename package to PointNeighbors.jl * Reformat code * Resort Project.toml * Change PointNeighbors compat to 0.2 * Fix `InitialCondition` set operations * Fix interpolation and tests * Fix docs * Fix example tests * Fix doctests * Fix n-body * Add more comments * update * Remove `pinv` in correction matrix inversion step (#526) * Remove `pinv` in correction matrix inversion step * Improve comment * Update validation files for dam break * Revert "Update validation files for dam break" This reverts commit d08c06d. * Update WCSPH validation files * Update WCSPH validation files * Add single-threaded validation files (see #481) * adapt summary * syntax * CFD abbreviation * adapt wording * avoid single-physics * Adapt some code to GPUs (#533) * Statically return zero verlocity and acceleration without `movement` * Make `write_v0!` and `write_u0!` GPU-compatible * Remove hardcoded float type in rectangular tank setup * Replace more hardcoded `Float64` types * Simplify zero SVector even more * Modify error message (#531) * change message * fix tests --------- Co-authored-by: Erik Faulhaber <[email protected]> * The Grand ReReReReReopening the final straw part 9: Intra-Particle-Force Surface Tension model (#434) * merge by hand * fix merge problems * fix merge * fix merge * fix merge * fix merge * fix merge * more merge fixes * format * fix * fix * fix * fix * fix * more fixes * fix * fix * fix * more fixes * fix * fix * format * running * update * update * optimization * update * update * update * update * format * up * update * remove unused code * switch to support radius * update * fix doc * add adhesion coefficient * update_broken * update * format * add to examples * fix tests * fix merge * fix typos * add basic test * format * rename * format * rename container system * reduce example run time * update news and readme * format * fix test * reduce run time * format * fix * correct merge * update doc test * revert * fix tests * fix * fix * review comments * fix * fix * correct some stuff * init with empty initial condition * review update * rename function * remove unnecessary if * docs * fix doc * revert one change * fix typo * update * try to avoid allocs * fix mem allocs * review update * update docs * update * review * fix * format * make examples smaller * reduce resolution * review * rename * use trixi_include * format * use trixi_include * update * update * review update * format * fix test errors * update * fix surface normal calculation * adjust testcases and review comments * update examples * remove saving interval * fix comment * remove dt * Fix n-body benchmark (#525) * Fix n-body benchmark * suppress warning * fix formatting * for real --------- Co-authored-by: Sven Berger <[email protected]> * Push new version number (#536) Co-authored-by: Erik Faulhaber <[email protected]> * Mark 0.1.3-dev (#537) Co-authored-by: Erik Faulhaber <[email protected]> * Add type `GPUSystem` to determine if a system is stored on the CPU or GPU (#532) * Add type `GPUSystem` to determine if a system is stored on the CPU or GPU * Fix systems and tests * Reformat * Fix tests * Fix n-body system * Reformat * Clean up callback conditions (#538) * Bump julia-actions/cache from 1 to 2 (#541) Bumps [julia-actions/cache](https://github.com/julia-actions/cache) from 1 to 2. - [Release notes](https://github.com/julia-actions/cache/releases) - [Changelog](https://github.com/julia-actions/cache/blob/main/devdocs/making_a_new_release.md) - [Commits](julia-actions/cache@v1...v2) --- updated-dependencies: - dependency-name: julia-actions/cache dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * final update * fix bib * test * test * fix bug * modify image * fine tuning * fine tuning * fine tuning * image size * fix typo * Update Michael's affiliation to U of Augsburg (#545) * Precompute git hash to avoid overhead (#542) * Precompute git hash * Store git hash in callbacks * Fix --------- Co-authored-by: Niklas Neher <[email protected]> * Move timers and `@trixi_timeit` to TrixiBase.jl (#543) * Move timers and `@trixi_timeit` to TrixiBase.jl * Require latest version of TrixiBase.jl * Fix tests --------- Co-authored-by: Niklas Neher <[email protected]> * Add `trixi2vtk` wrapper for `InitialCondition` (#546) * add wrapper * modify kwarg * implement suggestions * Example on how to save interpolated planes using postprocessing callback (#462) * add example * replace MSE with MRE * Revert "replace MSE with MRE" This reverts commit 1e00907. * revert * update * format * fix and reduce runtime * Fix comment * implement suggestions * format --------- Co-authored-by: Erik Faulhaber <[email protected]> * Add open boundaries (#442) * implement `ExtrudeFace` * use `sample_face` in `interpolate_plane_3d` * add docs * add tests * fix bug * create `OpenBoundarySPHSystem` * export OpenBoundarySPHSystem * fix bug * add `tlsph=true` * fix tuple bug * fix typo * fix again * fix for tests * fix layers * calculate particle spacing differently * add show-tests * add `within_boundary_zone` * change `floor` to `ceil` * boundary zone tests * add `evaluate_characteristics` * add tests * add reference functions * add `update_quantities!` * add buffer * add `SystemBuffer` tests * cosmetic change in tests * change `round` to `isapprox` * add update callback * add `UpdateCallback` * add `update_open_boundaries` * fix bugs * add timers * write prescribed quantities * improve dispatching * add example `pipe_flow_2d.jl` * add docs `UpdateCallback` * add `UpdateCallback` * fix typo * apply formatter * modify system buffer * add docstring * add docs * fix tests * fix typos * apply formatter * fix tests * add comment * fix tests * remove update bool * adapt docstring * implement suggestions * fix test * add check if callback is used * add comments in example file * generic types * merge main * undo combining compact support with DEM * remove density and pressure setter * implement suggestions * fix tests * Revert "remove density and pressure setter" This reverts commit 0203a3a. * implement suggestions * rework in- and outflow * rename setter functions * rename function * adapt tests * add docs * rename function * apply formatter * fix tests * fix bug in plot recipes * apply formatter * implement suggestions for `boundary_zone.jl` * rename kwarg * implement suggestions * implement suggestions * modify `check_domain!` * add comments and check for fluid system with buffer * modify error message * fix tests * fix `callback_used` * using Measurements * fix tests * sqrt(eps()) * fix tests again * add doc to `boundary.md` * link single fluid system * fix tests * implement suggestions * fix tests * change the order of functions * modify example * fix tests * fix test again * implement doc suggestions * modify dos * fix typos * implement suggestions * implement suggestions * add EDAC to the example * implement suggestions * modify tests * adapt docs * add `initial_callback_flag` * add to `NEWS.md` and `README.md` * implement suggestions * fix typo * New Feature Release (#548) Open Boundaries have been added * Set to development version 0.1.4 (#549) * Set to development version 0.1.4 * Update Project.toml * Rewrite `@threaded` macro to work with GPUs (#534) * Fix wrapping for GPU arrays * Adapt `@threaded` macro to work with GPUs * Use new `@threaded` macro * Fix NHS to run GPU code with CPU arrays * Import `@index` from KernelAbstractions.jl For some reason it doesn't run on the CPU otherwise. * Add proper docs for `@threaded` * Fix `foreach_neighbor` * Reformat * Add test for GPU code on the CPU * Fix unit tests * Add unit tests for GPU interaction * Also add tests for GPU versions of `wrap_v` and `wrap_u` * Add kwarg `neighborhood_search` to dam break example * Fix tests * Reformat * Add function `wrap_array` * Fix merge * Fix open boundary code * Implement suggestions * Fix syntax error from merge * Fix `wrap_u` and `wrap_v` for GPU array types * Fix constructor of `BoundarySPHSystem` * CompatHelper: add new compat entry for KernelAbstractions at version 0.9, (keep existing compat) (#551) Co-authored-by: CompatHelper Julia <[email protected]> * fix name (#554) * Manually set GPU_ARRAY compat (#555) Co-authored-by: Erik Faulhaber <[email protected]> * Add Viscosity Formulation as per Morris et al. (#504) * Add Viscosity Moris * format * fix * implement suggestions * other suggestions * fix equation * format * review comments * fixes * fix * update test * update * fix open boundary * update reference files * update validation criteria * update reference file * new reference files * ups * update ref file * update validation error bounds * update error bound * update error bound * add comment and correct setting of 'v' * final review part 1 * Feature highlights -> Features * Cleanup Viscosity/Boundary Model (#566) * extract changes from #564 * fix * add `custom_quantities...` (#567) Co-authored-by: Erik Faulhaber <[email protected]> * add acknowledgement * add link * change link * change link again --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Sven Berger <[email protected]> Co-authored-by: Erik Faulhaber <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Michael Schlottke-Lakemper <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: CompatHelper Julia <[email protected]>
See #484.
This is basically an alternative to #493.
Instead of defining a separate kernel for every threaded loop, I redefine the
@threaded
macro to run code multithreaded on the CPU or the GPU, depending on the type of the passed system.To run a simulation on an NVidia GPU: