Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fast performance counter to measure clock cycles #10

Open
achimnol opened this issue Jul 2, 2015 · 0 comments
Open

Use fast performance counter to measure clock cycles #10

achimnol opened this issue Jul 2, 2015 · 0 comments
Assignees

Comments

@achimnol
Copy link
Member

achimnol commented Jul 2, 2015

I performed a small experiment, running each timing functions 1M times on a Sandy Bridge server:

clock_gettime(CLOCK_MONOTONIC)          0.031977 sec
clock_gettime(CLOCK_MONOTONIC_RAW)      0.518126 sec
clock_gettime(CLOCK_MONOTONIC_COARSE)   0.007629 sec
clock_gettime(CLOCK_PROCESS_CPUTIME_ID) 0.648347 sec
clock_gettime(CLOCK_THREAD_CPUTIME_ID)  0.580868 sec
gettimeofday()                          0.032326 sec
rdpmc()                                 0.019933 sec
rdpmc() + memfence()                    0.029345 sec
rdtsc()                                 0.010432 sec
rdtsc() + memfence()                    0.018088 sec
rdtsc() + cpuid()                       0.052463 sec

The result shows some potential improvements in our timing functions in lib/common.hh.

  • The current get_usec() uses CLOCK_MONOTONIC_RAW, but we should change it to CLOCK_MONOTONIC, which is a way faster.
  • CLOCK_MONOTONIC_COARSE is the fastest, but its resolution is only about 4 ms.
  • For measuring the PPC values, we have used rdtsc() + cpuid() combination before, but we need to avoid using cpuid() to prevent out-of-order execution. Instead, we should use a lighter synchronization mechanisms such as memory fence.
achimnol added a commit that referenced this issue Jul 4, 2015
 * The device index 0 is now the CPU, and the indices of coprocessors
   starts from 1.  This change eliminates unnecessary if statements in
   load balancers and the system inspector.

 * Moved per-metric smoothing codes into the system inspector.
achimnol added a commit that referenced this issue Jul 6, 2015
 * Also replace rte_rdtsc() with rdtscp().
achimnol added a commit that referenced this issue Jul 6, 2015
@achimnol achimnol self-assigned this Jul 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant