test(vm): Improve instruction-counting VM benchmark #3105

slowli · 2024-10-16T07:28:18Z

What ❔

Replaces iai with an alternative; brushes up instruction counting in general.

Why ❔

The library currently used for the benchmark (iai) is unmaintained.
It doesn't work with newer valgrind versions.
It doesn't allow measuring parts of program execution, only the entire program run.

Checklist

PR title corresponds to the body of PR (we generate changelog entries from PRs).
Tests for the changes have been added / updated.
Documentation comments have been added / updated.
Code has been formatted via zkstack dev fmt and zkstack dev lint.

slowli · 2024-10-16T11:28:46Z

Observations so far:

Completely subjectively, the new approach has better DevEx; e.g., it allows filtering run benches and allows integrating reporters directly into the benchmark logic (see code).
Instruction / cycle count measured using the new approach seems to correspond to the old approach if ~90M instruction overhead on general and VM initialization is subtracted.
The new approach seems to better correlate with real-time benchmarks (more w.r.t. instructions than cycles), although there are still outliers. E.g., here are test results on my M2 Macbook:

                               time               cycles    instructions   cycles/s, B    instr/s, B
fast/deploy_simple_contract    1.4662 ms       148594653        13479007         101.4          9.19
legacy/deploy_simple_contract  2.4865 ms       175808750        31190368          70.7          12.5
fast/access_memory             39.774 ms      1080974146       715607457          27.1          18.0
legacy/access_memory           615.71 ms     11989387670      7320044515          19.5          11.9
fast/call_far                  31.002 ms       538142795       419103438          17.4          13.5
legacy/call_far                123.89 ms      2214133575      1252371237          17.9          10.1
fast/decode_shl_sub            22.284 ms       638405039       462780306          28.6          20.8
legacy/decode_shl_sub          513.15 ms     11170751394      6856193817          21.8          13.4
fast/event_spam                38.736 ms       804507408       517321574          20.8          13.5
legacy/event_spam              335.45 ms      6875852503      4141145639          20.5          12.3

So, the number of instructions per second is roughly the same for all benches and it has the expected order of magnitude 🙃

Not so good observations:

As expected, due to measuring parts of program execution, the benches are more sensitive to the setup logic. I've observed ~1% instruction / cycle changes caused by trivial changes in the benchmark source (e.g., iterating over benchmarks in the reverse order; running the init bench before / after other benches or not running it at all, etc.). To be fair, fluctuating results were partially true for the old approach as well, but probably to a lesser degree. Maybe, the results would be more stable with cachegrind instrumentation enabled, but that'd require installing a new version of valgrind.

core/tests/vm-benchmark/src/vm.rs

core/tests/vm-benchmark/src/bin/instruction_counts.rs

core/tests/vm-benchmark/benches/instructions.rs

core/tests/vm-benchmark/src/vm.rs

slowli added 10 commits October 16, 2024 09:06

Sketch yab benchmarking

f42469b

Add benchmark reporters

88d23b3

Update benchmark CI workflows

8fc1a23

Remove iai from workspace

6355f95

Improve instruction counting

991f76a

Update instruction_counts binary

d5c3592

Run instruction_counts in CI

7c6bbfa

Fix / improve docs

c1fb81d

Handle case with missing benchmark in base

86e64df

Fix bench ordering & empty outputs

8c55d2e

Set reported diff threshold to 2%

9b6d748

slowli changed the title ~~test: Improve instruction-counting VM benchmark~~ test(vm): Improve instruction-counting VM benchmark Oct 16, 2024

Fix commenting steps in CI

4dc0f13

slowli force-pushed the aov-pla-1049-improve-instructions-vm-benchmark branch from d279c32 to 4dc0f13 Compare October 16, 2024 12:57

slowli marked this pull request as ready for review October 16, 2024 13:56

slowli requested a review from a team as a code owner October 16, 2024 13:56

slowli requested review from yorik, alexandrst88, artmakh, hatemosphere, onyxet, otani88, iluwaa and joonazan October 16, 2024 13:56

joonazan reviewed Oct 16, 2024

View reviewed changes

core/tests/vm-benchmark/src/vm.rs Outdated Show resolved Hide resolved

joonazan reviewed Oct 16, 2024

View reviewed changes

core/tests/vm-benchmark/src/bin/instruction_counts.rs Show resolved Hide resolved

Clarify differing instruction counts message

5631a0f

slowli requested a review from joonazan October 18, 2024 06:50