Skip to content

Modules to benchmark heap operations in kernel/sched/cpudeadline.c

License

Notifications You must be signed in to change notification settings

tomcucinotta/cpudl-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cpudl-bench - Modules to benchmark/test sched/deadline heap operations
              by Tommaso Cucinotta <[email protected]>

This project compiles into a set of modules that benchmark heap operations
in kernel/sched/cpudeadline.c and are also useful for debugging purposes,
to check that the changes do not result in a change in behavior, as well
as performing checks on the heap health/consistency and invariant preservation.
Each benchmarking module contains a copy of some version of the cpudeadline.c
source, after a few modifications as described below.

Applied changes are:
1. change all identifiers prefixes from cpudl_ to mycpudl_ (to avoid conflicts
   with in-kernel identifiers)
2. skip the cpu_present() check wherever present (to allow for benchmarking
   with more CPUs than actually present in the system)
3. add nr_cpu_ids as explicit param at init time, and replace call to
   for_each_possible_cpu() with a for(i = 0; i < nr_cpu_ids; i++)
4. cpudeadline wraparound bugfix (*-bugfix*.c versions)
5. cpudeadline wraparound bugfix + speed-up patch (*-bugfix-fast-*.c versions)
6. additional printk() and dump of the whole heap status after each operation,
   for debugging purposes

The actual benchmarking is done by the cpudl_bench.c module, which:
a. initializes the mycpudl_* module with a number of CPUs as from the num_cpus
   module param
b. performs a number of insert and delete operations on the heap, as from the
   num_ops module param
c. computes the duration of each operation in cycles via the rdtsc, and stores
   each value in a pre-allocated in-memory array
d. at the end of the num_ops measurements, dumps the whole content of the
   array via printk().

Due to the last step, the module needs a large in-kernel ring buffer, if a large
num_ops module param is used.

The cpudeadline*-debug.c files, along with the *debug.ko modules variant, are
used to double-check that, step by step, the heap produced by the *-fast
version is exactly the same as the one produced by the original code.
The check can be done by diff-ing the dmesg output after insertion of the
*-fast.ko module vs the original one.

When running these tests for benchmarking, I recommend:

0. increase the in-kernel ring buffer:
   add log_buf_len=67108864 as kernel param at boot

1. exit the X11 environment:
   /etc/init.d/lightdm stop

2. stop the NetworkManager:
   /etc/init.d/network-manager stop

3. block the CPU frequency at the maximum allowed one:
   for c in $(seq 0 $[$(grep -c processor /proc/cpuinfo) - 1]); do
     sudo cpufreq-set -c $c -g performance;
   done

4. load the module cpudl-bugfix.ko, save the dmesg output and unload it

5. load the module cpudl-bugfix-fast.ko, save the dmesg output and unload it

6. process and compare the obtained dmesg outputs

In a set of runs performed on an Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz,
I've seen up to ~14% of average speed-up, e.g., with num_ops=100000 and up
to 256 CPUs:

  https://github.com/tomcucinotta/cpudl-bench/blob/master/cpudl-100000.pdf

and this is a similar prior measurement on a different laptop:

  https://github.com/tomcucinotta/cpudl-bench/blob/master/cpudl.pdf

About

Modules to benchmark heap operations in kernel/sched/cpudeadline.c

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages