Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional backend types to support Base.Threads #61

Merged
merged 5 commits into from
Jul 24, 2024

Conversation

kaipartmann
Copy link
Contributor

As discussed last week at JuliaCon, specifying the multithreading backend could be very beneficial for users. With the current approach of using Polyester in the @threaded macro, a user is forced to either use Polyester or only a serial update. The performance impact of mixing different multithreading backends is also described in kaipartmann/Peridynamics.jl#110.

This PR adds additional types that could be specified as parallelization backend. I tested the code with Peridynamics.jl and noticed a slight performance improvement using Threads.@threads :static with 64 Threads compared to Polyester.@batch.

I did not add unit tests, as I was unsure where to include them correctly in your current testing setup. If you point me in the right direction, I will also include them.

Copy link

codecov bot commented Jul 15, 2024

Codecov Report

Attention: Patch coverage is 0% with 9 lines in your changes missing coverage. Please review.

Project coverage is 88.34%. Comparing base (2b5d3ce) to head (ac5afc3).

Files Patch % Lines
src/util.jl 0.00% 8 Missing ⚠️
src/neighborhood_search.jl 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #61      +/-   ##
==========================================
- Coverage   89.81%   88.34%   -1.47%     
==========================================
  Files          16       16              
  Lines         481      489       +8     
==========================================
  Hits          432      432              
- Misses         49       57       +8     
Flag Coverage Δ
unit 88.34% <0.00%> (-1.47%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@svchb
Copy link
Collaborator

svchb commented Jul 15, 2024

Is the performance increase a general improvement or only in the mixed Polyester and Threads case?

@kaipartmann
Copy link
Contributor Author

kaipartmann commented Jul 16, 2024

@svchb I used the following Benchmark with 64 Threads:

using Peridynamics

function sphere_impact(; ns=6, np=3, path)
    Ø = 0.15
    ΔX0_sphere = Ø / ns
    pos_sphere, vol_sphere = uniform_sphere(Ø, ΔX0_sphere; center_z=Ø / 2 + ΔX0_sphere)
    sphere = Body(BBMaterial{EnergySurfaceCorrection}(), pos_sphere, vol_sphere)
    failure_permit!(sphere, false)
    material!(sphere; horizon=3.1ΔX0_sphere, E=210e9, rho=8000, Gc=2000)
    velocity_ic!(sphere, :all_points, :z, -20)
    lxy, lz = 2.0, 0.1
    ΔX0_plate = lz / np
    pos_plate, vol_plate = uniform_box(lxy, lxy, lz, ΔX0_plate; center_z=-lz / 2)
    plate = Body(BBMaterial{EnergySurfaceCorrection}(), pos_plate, vol_plate)
    material!(plate; horizon=3.1ΔX0_plate, E=27e9, rho=2700, Gc=10)
    ms = MultibodySetup(:sphere => sphere, :plate => plate)
    contact!(ms, :sphere, :plate; radius=min(ΔX0_sphere, ΔX0_plate))
    vv = VelocityVerlet(steps=2000)
    job = Job(ms, vv; path=path)
    @time submit(job)
    return nothing
end

##
path = "results/benchmarks/sphere_impact"
rm(path; recursive=true, force=true)
sphere_impact(; path) # compilation
sphere_impact(; ns=20, np=8, path) # work

Due to the Polyester-Threads mixing issue I thought it could be a good idea to also use Polyester.@batch, so it is used for every multithreading loop in the package in v0.3.0:

## Peridynamics.jl v0.3.0 - everywhere using Polyester.@batch
# compilation:
# 89.256836 seconds (47.14 M allocations: 14.914 GiB, 0.66% gc time, 7.14% compilation time)
# work:
# 45.514108 seconds (40.16 M allocations: 128.786 GiB, 8.94% gc time)

However, I found out this was the reason my parallel performance for single body simulations is not as good as before. When changing all Polyester.@batch statements in Peridynamics.jl to @threads :static calls, the performance of contact simulations also improves (if PointNeighbors.jl also uses @threads :static):

## Peridynamics.jl v0.3.1-DEV - everywhere using Threads.@threads :static
## PointNeighbors.jl v0.4.5-dev: update_grid!(...; parallelization_backend=ThreadsStaticBackend())
# compilation:
# 106.717161 seconds (54.34 M allocations: 15.806 GiB, 0.61% gc time, 5.13% compilation time)
# work:
# 39.184317 seconds (48.59 M allocations: 129.752 GiB, 9.53% gc time)

@efaulhaber, I also tried specifying KernelAbstractions.CPU(static=false) as you wrote me on slack, which is just slightly slower than @threads :static and the compilation takes longer:

## Peridynamics.jl v0.3.1-DEV - everywhere using Threads.@threads :static
## PointNeighbors.jl v0.4.5-dev: update_grid!(...; parallelization_backend=KernelAbstractions.CPU(static=false))
# compilation:
# 122.136295 seconds (52.11 M allocations: 15.556 GiB, 0.55% gc time, 4.67% compilation time)
# work (2 work runs necessary to bench compiled version)
# 40.164468 seconds (47.48 M allocations: 129.602 GiB, 8.91% gc time, 11.92% compilation time)
# 39.528993 seconds (47.38 M allocations: 129.596 GiB, 8.36% gc time)

Copy link
Member

@efaulhaber efaulhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this PR ;)

src/util.jl Outdated Show resolved Hide resolved
src/neighborhood_search.jl Outdated Show resolved Hide resolved
src/neighborhood_search.jl Outdated Show resolved Hide resolved
src/neighborhood_search.jl Outdated Show resolved Hide resolved
src/util.jl Outdated Show resolved Hide resolved
src/util.jl Outdated Show resolved Hide resolved
src/util.jl Outdated Show resolved Hide resolved
src/util.jl Outdated Show resolved Hide resolved
src/util.jl Show resolved Hide resolved
src/util.jl Show resolved Hide resolved
Copy link
Member

@efaulhaber efaulhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@efaulhaber efaulhaber enabled auto-merge (squash) July 24, 2024 09:45
@efaulhaber efaulhaber merged commit b095e4b into trixi-framework:main Jul 24, 2024
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants