You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to measure time spent in reduction operation of RCCL Allreduce. I found that eventually it calls this part of code in common_kernel.h.
#pragma unroll Unroll
for (int u=0; u < Unroll; u++) {
if (s < PreOpSrcs) tmp[u] = applyPreOp(preFn, tmp[u]);
acc[u] = applyReduce(redFn, acc[u], tmp[u]);
}
How can we measure time spent in applyReduce function? tried _clock64, wall_clock64. They are not helpful
The text was updated successfully, but these errors were encountered:
I am using RCCL tests only. But this gives complete time for allreduce application which involves communication and computation. I want to measure only time spent in computation i.e. reduction operation.
unsigned long long start = clock64();
acc[u] = applyReduce(redFn, acc[u], tmp[u]);
unsigned long long end = clock64();
To track the time in applyReduce().
As you may be aware in large-scale systems, reduction operations are performed asynchronously and in parallel. As a result, the computation and communication phases are interdependent and are highly optimized in libraries such as RCCL and NCCL.
I am trying to measure time spent in reduction operation of RCCL Allreduce. I found that eventually it calls this part of code in common_kernel.h.
#pragma unroll Unroll
for (int u=0; u < Unroll; u++) {
if (s < PreOpSrcs) tmp[u] = applyPreOp(preFn, tmp[u]);
acc[u] = applyReduce(redFn, acc[u], tmp[u]);
}
How can we measure time spent in applyReduce function? tried _clock64, wall_clock64. They are not helpful
The text was updated successfully, but these errors were encountered: