-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add JMH microbenchmarks #35
Conversation
Hi @szpak, Thanks a lot for finding time to work on it! 👍 Since BlockHound works with any JVM code, not just Reactor, I believe Reactor is not necessary here. We should measure the overhead of calling a blocking method in the following scenarios:
Also, make sure that you override the callback (it throws an exception by default, we don't need it in the benchmarks) |
example/src/jmh/java/reactor/blockhound/BlockHoundBlockingBenchmark.java
Outdated
Show resolved
Hide resolved
Thanks @bsideup for your comments. I will take a position on them after Devoxx Poland (which starts the following week). |
No Reactor code. Dummy blocking calls. And more.
I've reworked the benchmarks to cover mentioned cases. I removed all reference to Reactor and changed (potentially) blocking Because of #38 it is problematic to have only non-blocking calls with BlockHound installed from a thread that should be used for non-blocking operations. Without I made benchmark execution with 5 iterations (+ 3 warmup iterations) in 3 forks. The full results are there. The summary is here:
(I rearranged order in the output and shortened As expected, the impact on "blocking-friendly" threads is negligible (4 four lines). Difference between We could also perform some test on more complex call to check how that 10-20% overhead (in a negative case) behaves in a function of number of accompanying non blocking calls. Nevertheless, please write first how you see those use cases and received numbers. Btw, once merging, I propose to merge it without squashing to keep a reference to the first version with code for Reactor, just in a case we would like to come back to that idea in the future. |
Hi @szpak, Thanks a lot for your work!I pushed a change to your branch that simplifies the benchmarks and focuses on the actual overhead, I hope you don't mind :) I will appreciate your review in case I accidentally broke something. I changed the "blocking method" to Here are the results I got:
While the percentages may look scary, I would like to focus on the differences instead :) We have 3 scenarios here: 1. non-blocking callwhen the executed method is not marked as blocking. As we can see from the numbers, there is zero overhead from BlockHound on non-blocking methods. Expected, but good to verify. 2. blocking call but in non-blocking threada regular blocking call inside a thread that is not marked as non-blocking. This one is not a problem yet has to be intercepted, hence the overhead. The benchmark reports an overhead of ~0.3us per blocking method invocation. Given that the call is blocking, most probably it will block for a duration measured by milliseconds, which means that the total overhead is low. 3. blocking call inside a non-blocking thread but marked as "allowed"The heaviest one. A blocking call is detected but whitelisted (e.g. some logging). The difference is big (~5.8 times), but again, the overhead is ~2.3us per blocking call. The bottom line is:
P.S. after running the benchmarks with async-profiler, I identified a few possible optimizations in the native agent we may consider (e.g. fail fast if the native method is never marked as "allowed in some cases"). This will require some API changes and should be considered separately. |
} | ||
|
||
@Benchmark | ||
public void measureNonBlockingCall(BlockHoundInstalledState state) throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the JMH documentation calls inside the benchmark that do not return a value are easier to inline the optimalizations in JVM. Therefore, my tests had a return value or were using a BlackHole
to prevent that. However, I haven't analyzed the calls using threads or sleep(0)/incrementAndGet() with a profiler, so maybe you were able to ensure that it doesn't happen. Similar situation with optimizations is with running it in a loop.
In general, it's nice that you were able to simplify the benchmarks itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the benchmark starts a Thread now, hence the inlining cannot affect the results I guess
@szpak I merged the PR 🎉 Thanks a lot for the initial implementation and the feedback 👍 While evaluating the benchmarks, I already discovered a few easy ways how to reduce the overhead and will provide follow-up PRs with the optimizations 💯 |
General description
As discussed with @bsideup at GeeCON, I benchmarked the inpact of Bloud Hound integration to check how (un)safe it could be to enable it in production. I started with the following cases:
single()
scheduler)single()
scheduler)single()
scheduler)single()
scheduler)I tried to provide a baseline to a variant without the integration enabled (installed). As I haven't found a sensible reason to test blocking Flux which throws exception and aborts execution, so I applied a custom
blockingMethodCallback()
to simulate a warning printed on every blocking method detection (e.g. during the execution in production).In my tests, I took the worst scenario where there is no custom logic in Flux elements processing. In many cases the relative Block Hound impact will be lower.
All benchmarks are performed in a single thread environment. It could be potentially extended to execute multiple streams processing in multiple thread, but then more external factors can make the results less reliable.
Some Reactor-related question
Talking about reliability and reproducibility ocasionally I've been getting some variation in results - even for Reactor itself (without Block Hound). For example, for the naive baseline benchmark:
with numbers being non-blocking Flux
numbers = Flux.range(0, 100_000);
I receive result with quite large standard deviation (~20%
of the average). It's all with Block Hound disabled. I chosesingle()
to play with just one thread.It flattens nicely with multiple JVM forks, but anyway having:
for the same operation puzzles me a little bit. It can be an impact of my environment, but maybe you have some idea what in Reactor itself could make the passes so distinct? Maybe there is something I could tune in scheduling (or something else) to make the result more similar?
Summary
To conclude, I've got some raw results, but before posting it here and drawing some conclusions it would be good to test more cases where you suspect Bloud Hound impact could be possibly high (and also repeat my tests with higher number of iterations/forks). @bsideup what would you like to test?
Btw, the tests can be execute with
gw jmh
or (with some minimal configuration tuning) with the JMH plugin for Idea (less accurate, but more convinient when working of test cases).