Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI for benchmarks online #10

Open
lukego opened this issue Oct 9, 2016 · 13 comments
Open

CI for benchmarks online #10

lukego opened this issue Oct 9, 2016 · 13 comments

Comments

@lukego
Copy link

lukego commented Oct 9, 2016

This repo is cool! I am really happy to have a test suite. This seems great for people who want to maintain their own branches and keep track of how they compare with everybody else's. Like, have I broken something? Have my optimizations worked? Has somebody else made some optimizations that I should merge? etc. Just now I would like to maintain a branch called lowlevel to soak up things like intrinsics and DynASM Lua-mode so this is right on target for me.

I whipped up a Continuous Integration job to help. The CI downloads the latest code for some well-known branches, runs the benchmark suite 100 times for each branch, and reports the results. This updates automatically when any of the branches change (including the benchmark definitions).

The reason I run the benchmarks 100 times is to support tests that use randomness to exercise non-determinism in the JIT, like roulette (#9). Repeated tests mean that we can quantify how consistent the benchmark results are between runs, and once we have a metric for consistency then it is more straightforward to optimize (see LuaJIT/LuaJIT#218).

The branches I am testing now are master, v2.1, agentzh-v2.1, corsix/x64, and lukego/lowlevel. If anybody would like a branch added (or removed) just drop me a comment here. Currently the benchmark definitions are coming from my fork because I wanted to include roulette to check that variation is measured correctly.

Screenshot of the first graph (click to zoom):

benchmarks

and links:

Hope somebody else finds this useful, too! Feedback & pull requests welcome. I plan to keep this operational.

@corsix
Copy link
Collaborator

corsix commented Oct 9, 2016

corsix/x64 was effectively merged into v2.1, so I don't expect to be making any more commits to it. corsix/newgc on the other hand...

@lukego
Copy link
Author

lukego commented Oct 9, 2016

@corsix Roger. I updated the config to test newgc instead of x64. The results will automatically go up on the permalink above.

@lukego
Copy link
Author

lukego commented Oct 9, 2016

Is it hopelessly naive to simply run the benchmarks by evaluating them with no arguments? https://github.com/lukego/LuaJIT-branch-tests/blob/5043523d6cb59d35e7ecf5ee51f2253ab75d8675/default.nix#L57. I suppose that I should at least save the output to check if they are really working. Some execute very quickly.

@corsix do you need any special build options for newgc?

This was referenced Oct 11, 2016
@MikePall
Copy link
Member

@lukego Maybe you missed those bench/PARAM* files that contain the N arguments to each benchmark? Scale as appropriate to give a run time of a couple seconds each. No point in running these more than a dozen times.

Consider verifying the checksum of the benchmark output against known good checksums for each N. E.g. generated with plain Lua or the C equivalents of the tests (you really need this for larger N).

Note that mandelbrot suffers from numerical instability and may give different results, depending on fused vs. unfused FP arithmetics on some platforms (JIT-compiled, i.e. fused is actually more accurate). And partialsums depends on the accuracy of a couple of math library functions, which isn't very good on some platforms.

@lukego
Copy link
Author

lukego commented Oct 12, 2016

@MikePall Aha! Thanks for pointing out bench/PARAM*. Just the thing.

For me it is important to run tests 100+ times and to seed them with entropy. While we have issues like LuaJIT/LuaJIT#218 to contend with I think that benchmark results need to be interpreted as probability distributions rather than scalar values.

(The non-determinism is perhaps more important to me than to others. In the Snabb context we absolutely cannot have a situation where you deploy 100 routers and expect 5 of them to have half the capacity of the others. People are currently using lousy workarounds like detecting system overload and calling jit.flush() to roll the dice on a new trace. I need to find a proper solution to this & the CI has to show me improvements and regressions in how dependable performance is in the presence of workload entropy.)

@lukego
Copy link
Author

lukego commented Oct 17, 2016

I have updated the CI to run from PARAM_x86_CI.txt from my branchmarks branch. This is closely based on PARAM_x86_CI.txt but I removed a couple that seemed to fail or hang.

The results permalink is the same. Hopefully the report is beginning to be meaningful. Now each benchmark takes between 0.1s and 10s which is hopefully a reasonable range for getting stable and meaningful results.

I have pulled the iteration count down to 12 from 100. The Relative Standard Deviation graph probably needs to be taken with a grain of salt. I will revisit this when time permits. (Just now I am running all the iterations in a bash loop which ties up a test server continuously. I should make each run into a separate Nix derivation so that the CI will schedule them intelligently e.g. parallelize across more servers and interleave with other CI tasks instead of blocking them.)

Notable difference by eyeball is that the report is no longer flagging corsix/newgc as slower on the binary-trees benchmark. Previously this benchmark was only running for around 0.001 seconds and so the difference may well have been due to some tiny constant factor.

@SameeraDes
Copy link

I am trying to run the benchmarks in continuous integration job for Aarch64 port which is in v2.1. Is there any central CI system to which the Aarch64 tests be added, or I need to setup completely new CI job for the same?

@nico-abram
Copy link

@lukego
https://hydra.snabb.co/build/3807227 errors with "Aborted: cannot connect to ‘[email protected]’: ssh: connect to host murren-1.snabb.co port 22: Connection timed out (propagated from build 3807225) "
This (https://hydra.snabb.co/build/3803719) seems to be the most recent passing build

@lukego
Copy link
Author

lukego commented Jan 22, 2019

@nico-abram ah yes! The compute hosts running these LuaJIT benchmarks have recently been retired. I didn't think of this job because I haven't seen much activity here over the past few years and don't know how much interest there is.

If you want to run the benchmarks locally and generate the report you can use the instructions in the RaptorJIT README that I hope will work with standard LuaJIT too. I'm happy to advise if someone wants to troubleshoot a local setup or run a new CI.

If someone wants to sponsor running and updating a benchmark CI for LuaJIT then I'm also happy to help with that in my professional capacity at Snabb Solutions.

P.S. Here are some of the other ways that I put these tests to use while exploring the contribution of individual optimizations to overall performance:

That last one turned up a potentially important micro-optimization:

Surprisingly interesting to take simple benchmarks and use them to make systematic experiments!

@lukego
Copy link
Author

lukego commented Jan 22, 2019

@SameeraDes Good question. This CI is based on Nix and Nix seems to support ARM these days. So it should be possible to add an ARM server onto the backend but I don't know how much hassle to expect. The sticky-tape solution could also be for random machines to post results to Git repos in plain text and for this CI to download those are build/publish the reports.

I am meaning to migrate over to https://www.hercules-ci.com/ but haven't made time for that yet.

@SameeraDes
Copy link

Thanks for your response, @lukego
I have added CI based on Jenkins for ARM64 for now. It would be great if we can have central CI for all LuaJIT perf runs, I am willing to contribute for ARM64 port.

@siddhesh
Copy link

siddhesh commented Mar 5, 2019

@lukego we have set up a CI loop for luajit on the Linaro CI to run tests on commits to v2.1 on arm64:

https://ci.linaro.org/job/luajit-aarch64-perf/

We'll be happy to add an x86_64 node to it if you have one, or add an x86_64 node ourselves.

As for other architectures, please feel free to ping me either on this issue or personally to have more nodes added to the trigger. At some point we also need to figure out a place to report the results.

@lukego
Copy link
Author

lukego commented Mar 5, 2019

@siddhesh Cool!

I am running a CI for RaptorJIT and related projects that sometimes covers LuaJIT too. I don't have spare machines to contribute to other CIs like yours though so please go ahead with your own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants