Skip to content

Commit

Permalink
test/bench: osu-style bcast benchmark
Browse files Browse the repository at this point in the history
The barrier often does not exit uniformly especially if node-topology is
in play. This affects different collectie algorithms differently, thus
using the combined latency doesn't hides too much details for algorithm
comparisons.

The osu microbenchmarks measures collective latency individually then
reduce for min, max, and average. Why it is still suceptible to barrier
behavior, it does provide more details for some insights comparing
different algorithms.
  • Loading branch information
hzhou committed Nov 3, 2024
1 parent de7212f commit 8da66ac
Showing 1 changed file with 24 additions and 2 deletions.
26 changes: 24 additions & 2 deletions test/mpi/bench/bcast.def
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,28 @@ include: macros/mtest.def
page: bcast, bench_frame
data: buf, size, MPI_CHAR

&call measure_with_barrier
MPI_Bcast($(data), 0, comm)
$global root=0
$(if:0)
&call measure_with_barrier
MPI_Bcast($(data), root, comm)
$(else)
$if grank == 0
$call header_coll_latency
&call foreach_size
$my tf_min, tf_max, tf_avg, tf_sigma
$(set:MIN_ITER=0.001/tf_max)
&call coll_warmup
measure_bcast(iter, root, comm, buf, size, &tf_min, &tf_max, &tf_avg, &tf_sigma)
tf_dur = tf_max
$if iter < 100
iter = 100
measure_bcast(iter, root, comm, buf, size, &tf_min, &tf_max, &tf_avg, &tf_sigma)
$if grank == 0
$call report_coll_latency, size

fncode: measure_bcast(int iter, int root, comm, buf, size, pf_min, pf_max, pf_avg, pf_sigma)
&call measure_coll_latency, iter
MPI_Bcast($(data), root, comm)
$(for:min,max,avg,sigma)
*pf_$1 = tf_$1

0 comments on commit 8da66ac

Please sign in to comment.