Skip to content

Commit

Permalink
Adding more scaling results.
Browse files Browse the repository at this point in the history
  • Loading branch information
Johannes Markert committed Nov 11, 2024
1 parent 8b5757a commit 22f86c2
Show file tree
Hide file tree
Showing 6 changed files with 77 additions and 12 deletions.
8 changes: 8 additions & 0 deletions paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -608,3 +608,11 @@ @article{geuzaine2009gmsh
year={2009},
publisher={Wiley Online Library}
}

@misc{terrabyte.lrz.de,
title = {terrabyte supercomputer},
author = {Leibniz Supercomputing Centre},
url = {https://docs.terrabyte.lrz.de},
urldate = {2024-11-11},
publisher = {LRZ},
}
46 changes: 34 additions & 12 deletions paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,18 +201,30 @@ on the JUQUEEN and the JUWELS supercomputers at the Jülich Supercomputing
Center. In \autoref{tab:t8code_runtimes}, [@holke_optimized_2021] we show that
`t8code`'s ghost routine is exceptionally fast with proper scaling of up to 1.1
trillion mesh elements. Computing ghost layers around parallel domains is
usually the most expensive of all mesh operations. Furthermore, in a prototype
code [@Dreyer2021] implementing a high-order discontinuous Galerkin method (DG)
for advection-diffusion equations on dynamically adaptive hexahedral meshes we
can report of a 12 times speed-up compared to non-AMR meshes with only an
overall 15\% runtime contribution of `t8code`. In \autoref{fig:t8code_runtimes}
we compare the runtimes over number of processes of the DG solver and the
summed mesh operations done by t8code which are ghost computation, ghost data
exchange, partitioning (load balancing), refinement and coarsening as well as
balancing ensuring only a difference of one refinement level among element's
face neighbors. From the graphs in \autoref{fig:t8code_runtimes} we clearly
see that `t8code` only takes around 15\% to 20\% of overall runtime compared
to the solver.
usually the most expensive of all mesh operations. To put these results into
perspective, we conducted scaling tests on the terrabyte cluster
[terrabyte.lrz.de] at Leibniz Computing Centre comparing the ghost layer
creation runtimes of p4est and t8code. See \autoref{fig:t8code_runtimes} for
the results. The p4est library has been established as one of the most
performant meshing libraries [@BursteddeWilcoxGhattas11] specializing on
adaptive quadrilateral and hexahedral meshes. Clearly, t8code shows near
perfect scaling for tetrahedral meshes on par with p4est. The absolute runtime
of t8code is around 1.5 times the runtime of p4est measured on a per ghost
element basis. This is expected since the ghost layer algorithm is more complex
and thus a bit less optimized due to the support of a wide range of element
types.

Furthermore, in a prototype code [@Dreyer2021] implementing a high-order
discontinuous Galerkin method (DG) for advection-diffusion equations on
dynamically adaptive hexahedral meshes we can report of a 12 times speed-up
compared to non-AMR meshes with only an overall 15\% runtime contribution of
`t8code`. In \autoref{fig:t8code_runtimes} we compare the runtimes over number
of processes of the DG solver and the summed mesh operations done by t8code
which are ghost computation, ghost data exchange, partitioning (load
balancing), refinement and coarsening as well as balancing ensuring only a
difference of one refinement level among element's face neighbors. From the
graphs in \autoref{fig:t8code_runtimes} we clearly see that `t8code` only takes
around 15\% to 20\% of overall runtime compared to the solver.

+----------------+-------------------+--------------------+--------+
| \# Process | \# Elements | \# Elem. / process | Ghost |
Expand All @@ -226,6 +238,16 @@ to the solver.
| elements. \label{tab:t8code_runtimes} |
+================+===================+====================+========+

![Runtimes of ghost layer creation on the terraybyte cluster
[terrabyte.lrz.de] for p4est and t8code. The meshes have been refined into
a Menger sponge for hexahedral mesh with p4est (max. level 12) and a Sierpinski
sponge for the tetrahedral mesh in t8code (max. level 13) to create a fractal
pattern with billions of elements as a stress test. To make the two runs
comparable the runtimes have been divided by the average local number of ghost
elements on a MPI rank.
\label{fig:ghost_layer_runtimes}
](pics/plot-timings-per-num-ghosts.png){width="90%"}

![Runtimes on JUQUEEN of the solver and summed mesh operations of our DG
prototype code coupled with `t8code`. Mesh operations are ghost computation,
ghost data exchange, partitioning (load balancing), refinement and coarsening
Expand Down
25 changes: 25 additions & 0 deletions pics/plot-timings-per-num-ghosts.gpi
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
set terminal pngcairo enhanced

set output "plot-timings-per-num-ghosts.png"

set encoding utf8

set grid

p4hex = "timings-p4.dat"
t8tet = "timings-t8-tet.dat"

set xlabel "number of MPI ranks"
set ylabel "ghost layer runtime over #ghosts [μs/#ghosts]"

set logscale x 2

set yrange[0:7]

set key bottom

set title "Runtimes of ghost layer creation per ghost element over num. of proc."

plot \
p4hex using 1:($2/$3 * 1e6) with lp lw 2 ps 2 title "p4est with hexhedral mesh (218 billion elements)", \
t8tet using 1:($2/$3 * 1e6) with lp lw 2 ps 2 title "t8code with tetrahedral mesh (93 billion elements)"
Binary file added pics/plot-timings-per-num-ghosts.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 5 additions & 0 deletions pics/timings-p4.dat
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
80 2.7853 620022
160 1.47469 295267
320 0.795606 206677
640 0.431872 98434
1280 0.246186 68897
5 changes: 5 additions & 0 deletions pics/timings-t8-tet.dat
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
80 2.9717 620022
160 1.79744 295267
320 1.17688 206677
640 0.591025 98434
1280 0.332564 68897

0 comments on commit 22f86c2

Please sign in to comment.