You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The result shows some potential improvements in our timing functions in lib/common.hh.
The current get_usec() uses CLOCK_MONOTONIC_RAW, but we should change it to CLOCK_MONOTONIC, which is a way faster.
CLOCK_MONOTONIC_COARSE is the fastest, but its resolution is only about 4 ms.
For measuring the PPC values, we have used rdtsc() + cpuid() combination before, but we need to avoid using cpuid() to prevent out-of-order execution. Instead, we should use a lighter synchronization mechanisms such as memory fence.
The text was updated successfully, but these errors were encountered:
* The device index 0 is now the CPU, and the indices of coprocessors
starts from 1. This change eliminates unnecessary if statements in
load balancers and the system inspector.
* Moved per-metric smoothing codes into the system inspector.
I performed a small experiment, running each timing functions 1M times on a Sandy Bridge server:
The result shows some potential improvements in our timing functions in
lib/common.hh
.get_usec()
usesCLOCK_MONOTONIC_RAW
, but we should change it toCLOCK_MONOTONIC
, which is a way faster.CLOCK_MONOTONIC_COARSE
is the fastest, but its resolution is only about 4 ms.rdtsc() + cpuid()
combination before, but we need to avoid usingcpuid()
to prevent out-of-order execution. Instead, we should use a lighter synchronization mechanisms such as memory fence.The text was updated successfully, but these errors were encountered: