-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize observability and debugging experience #3901
Optimize observability and debugging experience #3901
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments with context. This PR relates to #3900.
|| stackTraceRow.startsWith("reactor.core.publisher.Mono.onAssembly") | ||
|| stackTraceRow.equals("reactor.core.publisher.Mono.onAssembly") | ||
|| stackTraceRow.equals("reactor.core.publisher.Flux.onAssembly") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line removal is unrelated to the rest of the PR. The dropped line is redundant, as it is preceded by a predicate that matches in strictly more contexts. Question: should the Flux.onAssembly
line (and some others below) be updated to also use startsWith
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they can all use startsWith
instead of equals
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, will do (and update the PR description to match this additional change).
static String[] extractOperatorAssemblyInformationParts(String source) { | ||
String[] uncleanTraces = source.split("\n"); | ||
final List<String> traces = Stream.of(uncleanTraces) | ||
.map(String::trim) | ||
.filter(s -> !s.isEmpty()) | ||
.collect(Collectors.toList()); | ||
Iterator<String> traces = trimmedNonemptyLines(source); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The largest improvements in this PR come from these changes.
Key contributors to performance of the old implementation:
String#split
accepts a regular expression. We're no longer performing the comparatively expensive operation of compiling regular expressions.String#split
allocates an array and substrings proportional to the provided input, covering a potentially large part of the input that does not at all influence the result of this method.- The
Stream
operation likewise processes irrelevant lines, and allocates a potentially large list.
The new implementation instead lazily iterates over the input, processing only relevant lines, and tracking only the two most-recently-seen lines.
if (isUserCode(currentLine)) { | ||
// No line is a Reactor API line. | ||
return new String[]{currentLine}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some logic was moved around, but existing comments were relocated with it. This should aid review. It's nice that there was already good test coverage.
/** | ||
* Returns an iterator over all trimmed non-empty lines in the given source string. | ||
* | ||
* @implNote This implementation attempts to minimize allocations. | ||
*/ | ||
private static Iterator<String> trimmedNonemptyLines(String source) { | ||
return new Iterator<String>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This manually-crafted iterator feels a bit like "coding like it's 1999", but I didn't find a less-verbose alternative that doesn't impact over-all code readability. (Had Guava been on the classpath, then I'd have opted to extend AbstractIterator
.) Open to suggestions!
Thanks for the PR. I will have a closer look. However, bear in mind this change won't make it into 3.7.0-RC1 but would target directly 3.7.0. I think that's ok as it's not an API change. Together with #3900 it is a behaviour change, but unless there were side effects in code they would also not be observed aside from the performance gains. I think it's worth noting some sort of warning for the |
Traces#extractOperatorAssemblyInformationParts
Scannable#name()
and related logic
@chemicL I just realized that the impact of this PR is likely over-stated, as in practice the input stacktrace appears to always be generated by a So perhaps we can optimize the code further by skipping the intermediate single-string representation. I might have time for a closer look into that this weekend. |
There's one other code path:
I now have a POC for this locally. Against a (ReactorDebugAgent-using) benchmark of local code it isn't yet faster that the current PR (surprisingly; requires more investigation), but for the cleanest and likely fastest code, it'd be better if we can have |
Hey, @Stephan202. I've been a bit busy lately but would like to revisit your PRs. Can you give an update on the above considerations? How impactful is this change or an alternative change from your PoC? I assume we would be able to alter the |
Thanks for the ping on this PR @chemicL; I meant to report back here. I did try the alternative approach mentioned (including the I can look into polishing that code and pushing it to an alternative branch for a second opinion; will try to find some time. That said, based on the above, my tentative suggestion would be to proceed with this PR as-is. (Except perhaps for trimming the JMH benchmark inputs, because as mentioned, in practice the code will generally parse at most two stack frames, rather than 1000.) If the alternative approach can be made more performant after all, that can be tackled in a follow-up PR. |
f7ebe1c
to
f470638
Compare
Rebased branch; applied 100% cleanly. |
The experiments I tried are on this messy branch. If desired I can clean it up, though it's a bit TBD when I'll have time to dive back into this topic. |
@Stephan202 I am just trying to understand whether the change is actually needed. As I understand you discovered that this will only be triggered for processing two stack frames (can you point to where it's limited to only 2?), therefore the benchmark is not relevant to the expected usage of this API, yes? And also, there is the risk of touching and changing a stable code base for not much benefit and potential regressions. Is there a possibility that these optimizations will have an actual effect on real world applications? |
@chemicL fair question! Based on earlier testing the answer is "yes, this is an improvement", but let me get back to you in the coming days with some more hard data. (Exact timing TBD.) |
This logic may be executed many times, e.g. if a hot code path uses `{Mono,Flux}#log` or Micrometer instrumentation. The added benchmark shows that for large stack traces the new implementation is several orders of magnitude more efficient in terms of compute and memory resource utilization. While there, improve two existing benchmarks by utilizing the black hole to which benchmark method return values are implicitly sent.
(cherry picked from commit 009ec89)
f470638
to
c3e519a
Compare
@chemicL alright, I rebased the branch on For the Benchmark before the changes
Benchmark after the changes
In short, this means a >3x speedup and more than halving of allocated memory for the most common 2-line case. I also had another look at a "more representative" Picnic-internal benchmark, where some longer reactive chains are subscribed to. I can't easily share this code, but with the Reactor Debug Agent enabled (as we do in production) there's an 18-22% speedup and ~40% reduction in allocated memory: Internal benchmark before the changes
Internal benchmark after the changes
Our main store application is a modular monolith that makes very heavy use of Reactor, with some key request flows creating very long reactive chains. I'm reasonably confident we'll see a noticeable latency improvement with this change. |
Wow, that's really impressive @Stephan202 🥇 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a commit. Tnx for the review @chemicL!
|| stackTraceRow.startsWith("reactor.core.publisher.Mono.onAssembly") | ||
|| stackTraceRow.equals("reactor.core.publisher.Mono.onAssembly") | ||
|| stackTraceRow.equals("reactor.core.publisher.Flux.onAssembly") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, will do (and update the PR description to match this additional change).
if (index >= source.length()) { | ||
return null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized that we can drop this case 👁️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. I didn't finish reviewing yesterday and it seems we noticed the same thing :)
The Java 11 tests failed, but |
@@ -29,6 +28,7 @@ | |||
* @author Sergei Egorov | |||
*/ | |||
final class Traces { | |||
private static final String PUBLISHER_PACKAGE_PREFIX = "reactor.core.publisher."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move the private constant below the package-private ones.
Thanks. I have some minor comments and we're good to go. The test failure is a flaky test indeed, unrelated to this change. I think we can merge this since this is a great improvement. As a side discussion - have you considered using |
} | ||
|
||
@Nullable | ||
private String getNextLine() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an outline of an idea to avoid using String and get the next line:
// Assume the entire input (String source) is wrapped in CharBuffer
CharBuffer cb = CharBuffer.wrap(source);
private CharBuffer getNextLine() {
int i = 0;
while (i < cb.length()) {
if (Character.isWhitespace(cb.charAt(i))) continue;
int end = i + 1;
while (end < cb.length() && cb.charAt(end) != '\n') {
end++;
}
CharBuffer line = cb.subSequence(i, end);
i = end + 1;
return line;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The match for the reactor package name can also be done in the same linear scanning manner. I wonder if it'd be faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look at this, and went down the rabbit hole. The TL;DR is that I added three commits that further improve performance:
- One that avoids
String#join
, unrelated to your suggestion here. - One that replaces
String#trim
, such that the original string's underlyingchar[]
is always reused, thanks to the implementation ofString#substring
. This is IIUC an alternative to your suggestion to useCharBuffer
; I couldn't use the latter, as it lacks operations such as.startsWith
andindexOf
. - One that introduces a custom
Substring
class and is "somehow" even more performant.
Benchmark of the code on `main`
Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units
TracesBenchmark.measureThroughput 0 0 avgt 5 91.389 ± 1.255 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 8765.340 ± 120.723 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 840.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 119.000 counts
TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 179.000 ms
TracesBenchmark.measureThroughput 0 1 avgt 5 122.102 ± 3.312 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 1 avgt 5 8060.272 ± 216.623 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1 avgt 5 1032.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 1 avgt 5 109.000 counts
TracesBenchmark.measureThroughput:gc.time 0 1 avgt 5 158.000 ms
TracesBenchmark.measureThroughput 0 2 avgt 5 148.867 ± 2.750 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 2 avgt 5 7892.060 ± 145.950 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 2 avgt 5 1232.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 2 avgt 5 107.000 counts
TracesBenchmark.measureThroughput:gc.time 0 2 avgt 5 156.000 ms
TracesBenchmark.measureThroughput 1 0 avgt 5 122.145 ± 1.654 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 0 avgt 5 8057.222 ± 109.248 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 0 avgt 5 1032.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 0 avgt 5 109.000 counts
TracesBenchmark.measureThroughput:gc.time 1 0 avgt 5 165.000 ms
TracesBenchmark.measureThroughput 1 1 avgt 5 145.273 ± 0.361 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 1 avgt 5 8087.325 ± 20.159 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 1 avgt 5 1232.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 1 avgt 5 110.000 counts
TracesBenchmark.measureThroughput:gc.time 1 1 avgt 5 158.000 ms
TracesBenchmark.measureThroughput 1 2 avgt 5 177.756 ± 2.771 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 2 avgt 5 7639.569 ± 119.523 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 2 avgt 5 1424.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 2 avgt 5 104.000 counts
TracesBenchmark.measureThroughput:gc.time 1 2 avgt 5 152.000 ms
TracesBenchmark.measureThroughput 2 0 avgt 5 147.853 ± 2.101 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 0 avgt 5 8049.391 ± 114.613 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 0 avgt 5 1248.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 0 avgt 5 109.000 counts
TracesBenchmark.measureThroughput:gc.time 2 0 avgt 5 151.000 ms
TracesBenchmark.measureThroughput 2 1 avgt 5 176.972 ± 2.701 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 1 avgt 5 7759.568 ± 118.381 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 1 avgt 5 1440.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 1 avgt 5 106.000 counts
TracesBenchmark.measureThroughput:gc.time 2 1 avgt 5 154.000 ms
TracesBenchmark.measureThroughput 2 2 avgt 5 199.950 ± 0.792 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 2 avgt 5 8203.243 ± 32.542 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 2 avgt 5 1720.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 2 avgt 5 111.000 counts
TracesBenchmark.measureThroughput:gc.time 2 2 avgt 5 162.000 ms
Benchmark of the already-reviewed code
Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units
TracesBenchmark.measureThroughput 0 0 avgt 5 27.709 ± 1.321 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 12390.947 ± 590.685 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 360.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 168.000 counts
TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 226.000 ms
TracesBenchmark.measureThroughput 0 1 avgt 5 47.622 ± 1.357 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 1 avgt 5 11534.798 ± 330.546 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1 avgt 5 576.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 1 avgt 5 157.000 counts
TracesBenchmark.measureThroughput:gc.time 0 1 avgt 5 217.000 ms
TracesBenchmark.measureThroughput 0 2 avgt 5 46.575 ± 2.868 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 2 avgt 5 11796.026 ± 725.031 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 2 avgt 5 576.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 2 avgt 5 160.000 counts
TracesBenchmark.measureThroughput:gc.time 0 2 avgt 5 222.000 ms
TracesBenchmark.measureThroughput 1 0 avgt 5 48.489 ± 3.267 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 0 avgt 5 11330.727 ± 758.029 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 0 avgt 5 576.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 0 avgt 5 154.000 counts
TracesBenchmark.measureThroughput:gc.time 1 0 avgt 5 208.000 ms
TracesBenchmark.measureThroughput 1 1 avgt 5 46.214 ± 2.285 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 1 avgt 5 11887.215 ± 584.466 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 1 avgt 5 576.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 1 avgt 5 162.000 counts
TracesBenchmark.measureThroughput:gc.time 1 1 avgt 5 212.000 ms
TracesBenchmark.measureThroughput 1 2 avgt 5 46.943 ± 2.083 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 2 avgt 5 11702.329 ± 513.790 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 2 avgt 5 576.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 2 avgt 5 159.000 counts
TracesBenchmark.measureThroughput:gc.time 1 2 avgt 5 230.000 ms
TracesBenchmark.measureThroughput 2 0 avgt 5 48.431 ± 1.966 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 0 avgt 5 10712.378 ± 436.213 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 0 avgt 5 544.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 0 avgt 5 146.000 counts
TracesBenchmark.measureThroughput:gc.time 2 0 avgt 5 210.000 ms
TracesBenchmark.measureThroughput 2 1 avgt 5 46.974 ± 3.057 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 1 avgt 5 11046.458 ± 731.622 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 1 avgt 5 544.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 1 avgt 5 151.000 counts
TracesBenchmark.measureThroughput:gc.time 2 1 avgt 5 210.000 ms
TracesBenchmark.measureThroughput 2 2 avgt 5 50.708 ± 0.980 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 2 avgt 5 10230.724 ± 197.935 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 2 avgt 5 544.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 2 avgt 5 139.000 counts
TracesBenchmark.measureThroughput:gc.time 2 2 avgt 5 187.000 ms
Benchmark after the first improvement
Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units
TracesBenchmark.measureThroughput 0 0 avgt 5 21.666 ± 0.360 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 10563.716 ± 175.257 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 240.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 144.000 counts
TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 212.000 ms
TracesBenchmark.measureThroughput 0 1 avgt 5 36.057 ± 1.438 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 1 avgt 5 10791.558 ± 429.161 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1 avgt 5 408.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 1 avgt 5 146.000 counts
TracesBenchmark.measureThroughput:gc.time 0 1 avgt 5 206.000 ms
TracesBenchmark.measureThroughput 0 2 avgt 5 35.516 ± 1.274 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 2 avgt 5 10955.809 ± 391.095 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 2 avgt 5 408.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 2 avgt 5 149.000 counts
TracesBenchmark.measureThroughput:gc.time 0 2 avgt 5 209.000 ms
TracesBenchmark.measureThroughput 1 0 avgt 5 37.723 ± 1.266 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 0 avgt 5 10719.145 ± 360.313 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 0 avgt 5 424.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 0 avgt 5 145.000 counts
TracesBenchmark.measureThroughput:gc.time 1 0 avgt 5 201.000 ms
TracesBenchmark.measureThroughput 1 1 avgt 5 35.600 ± 2.316 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 1 avgt 5 11360.236 ± 738.477 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 1 avgt 5 424.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 1 avgt 5 154.000 counts
TracesBenchmark.measureThroughput:gc.time 1 1 avgt 5 221.000 ms
TracesBenchmark.measureThroughput 1 2 avgt 5 36.167 ± 1.605 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 2 avgt 5 11180.857 ± 492.758 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 2 avgt 5 424.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 2 avgt 5 152.000 counts
TracesBenchmark.measureThroughput:gc.time 1 2 avgt 5 213.000 ms
TracesBenchmark.measureThroughput 2 0 avgt 5 34.945 ± 0.914 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 0 avgt 5 10697.778 ± 280.941 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 0 avgt 5 392.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 0 avgt 5 145.000 counts
TracesBenchmark.measureThroughput:gc.time 2 0 avgt 5 210.000 ms
TracesBenchmark.measureThroughput 2 1 avgt 5 35.215 ± 1.076 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 1 avgt 5 10615.784 ± 325.177 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 1 avgt 5 392.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 1 avgt 5 144.000 counts
TracesBenchmark.measureThroughput:gc.time 2 1 avgt 5 198.000 ms
TracesBenchmark.measureThroughput 2 2 avgt 5 35.754 ± 1.788 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 2 avgt 5 10456.676 ± 524.036 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 2 avgt 5 392.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 2 avgt 5 142.000 counts
TracesBenchmark.measureThroughput:gc.time 2 2 avgt 5 200.000 ms
Benchmark after the second improvement
Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units
TracesBenchmark.measureThroughput 0 0 avgt 5 16.695 ± 0.223 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 6854.558 ± 91.285 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 120.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 94.000 counts
TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 130.000 ms
TracesBenchmark.measureThroughput 0 1 avgt 5 27.538 ± 2.173 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 1 avgt 5 7482.351 ± 582.492 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1 avgt 5 216.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 1 avgt 5 102.000 counts
TracesBenchmark.measureThroughput:gc.time 0 1 avgt 5 143.000 ms
TracesBenchmark.measureThroughput 0 2 avgt 5 26.589 ± 1.541 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 2 avgt 5 7748.273 ± 448.220 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 2 avgt 5 216.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 2 avgt 5 106.000 counts
TracesBenchmark.measureThroughput:gc.time 0 2 avgt 5 154.000 ms
TracesBenchmark.measureThroughput 1 0 avgt 5 28.482 ± 0.514 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 0 avgt 5 7500.140 ± 135.789 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 0 avgt 5 224.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 0 avgt 5 102.000 counts
TracesBenchmark.measureThroughput:gc.time 1 0 avgt 5 150.000 ms
TracesBenchmark.measureThroughput 1 1 avgt 5 26.180 ± 0.323 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 1 avgt 5 8159.420 ± 101.176 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 1 avgt 5 224.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 1 avgt 5 111.000 counts
TracesBenchmark.measureThroughput:gc.time 1 1 avgt 5 163.000 ms
TracesBenchmark.measureThroughput 1 2 avgt 5 26.194 ± 0.395 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 2 avgt 5 8155.085 ± 122.896 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 2 avgt 5 224.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 2 avgt 5 111.000 counts
TracesBenchmark.measureThroughput:gc.time 1 2 avgt 5 157.000 ms
TracesBenchmark.measureThroughput 2 0 avgt 5 26.485 ± 0.255 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 0 avgt 5 7489.392 ± 72.106 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 0 avgt 5 208.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 0 avgt 5 102.000 counts
TracesBenchmark.measureThroughput:gc.time 2 0 avgt 5 146.000 ms
TracesBenchmark.measureThroughput 2 1 avgt 5 26.530 ± 0.298 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 1 avgt 5 7476.572 ± 83.959 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 1 avgt 5 208.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 1 avgt 5 102.000 counts
TracesBenchmark.measureThroughput:gc.time 2 1 avgt 5 149.000 ms
TracesBenchmark.measureThroughput 2 2 avgt 5 26.710 ± 0.566 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 2 avgt 5 7426.339 ± 157.301 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 2 avgt 5 208.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 2 avgt 5 101.000 counts
TracesBenchmark.measureThroughput:gc.time 2 2 avgt 5 146.000 ms
Benchmark after the third improvement
Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units
TracesBenchmark.measureThroughput 0 0 avgt 5 18.096 ± 0.304 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 7588.502 ± 127.338 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 144.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 103.000 counts
TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 145.000 ms
TracesBenchmark.measureThroughput 0 1 avgt 5 22.850 ± 0.378 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 1 avgt 5 5008.182 ± 83.316 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1 avgt 5 120.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 1 avgt 5 68.000 counts
TracesBenchmark.measureThroughput:gc.time 0 1 avgt 5 98.000 ms
TracesBenchmark.measureThroughput 0 2 avgt 5 20.251 ± 0.517 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 2 avgt 5 5651.105 ± 144.835 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 2 avgt 5 120.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 2 avgt 5 77.000 counts
TracesBenchmark.measureThroughput:gc.time 0 2 avgt 5 114.000 ms
TracesBenchmark.measureThroughput 1 0 avgt 5 23.276 ± 0.700 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 0 avgt 5 4261.171 ± 128.182 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 0 avgt 5 104.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 0 avgt 5 58.000 counts
TracesBenchmark.measureThroughput:gc.time 1 0 avgt 5 79.000 ms
TracesBenchmark.measureThroughput 1 1 avgt 5 19.319 ± 0.161 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 1 avgt 5 5133.773 ± 43.098 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 1 avgt 5 104.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 1 avgt 5 70.000 counts
TracesBenchmark.measureThroughput:gc.time 1 1 avgt 5 96.000 ms
TracesBenchmark.measureThroughput 1 2 avgt 5 19.673 ± 0.342 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 2 avgt 5 5041.454 ± 87.186 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 2 avgt 5 104.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 2 avgt 5 68.000 counts
TracesBenchmark.measureThroughput:gc.time 1 2 avgt 5 89.000 ms
TracesBenchmark.measureThroughput 2 0 avgt 5 18.855 ± 0.652 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 0 avgt 5 5260.401 ± 184.048 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 0 avgt 5 104.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 0 avgt 5 72.000 counts
TracesBenchmark.measureThroughput:gc.time 2 0 avgt 5 99.000 ms
TracesBenchmark.measureThroughput 2 1 avgt 5 18.931 ± 0.206 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 1 avgt 5 5238.827 ± 56.895 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 1 avgt 5 104.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 1 avgt 5 71.000 counts
TracesBenchmark.measureThroughput:gc.time 2 1 avgt 5 103.000 ms
TracesBenchmark.measureThroughput 2 2 avgt 5 18.723 ± 0.983 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 2 avgt 5 5297.753 ± 279.570 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 2 avgt 5 104.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 2 avgt 5 72.000 counts
TracesBenchmark.measureThroughput:gc.time 2 2 avgt 5 100.000 ms
So for the common (1, 1) case, we see the following timing and memory usage differences:
Variant Speed Normalized garbage
============= ============================ ==========================
Baseline 145.273 ± 0.361 ns/op 1232.000 ± 0.001 B/op
Reviewed code 46.214 ± 2.285 ns/op (-68%) 576.000 ± 0.001 B/op (-53%)
Speedup 1 35.600 ± 2.316 ns/op (-23%) 424.000 ± 0.001 B/op (-26%)
Speedup 2 26.180 ± 0.323 ns/op (-26%) 224.000 ± 0.001 B/op (-47%)
Speedup 3 19.319 ± 0.161 ns/op (-26%) 104.000 ± 0.001 B/op (-53%)
I can see how "Speedup 3" is controversial from a maintainability point of view. I guess the only way to justify it, is by realizing that we're dealing with a very hot codepath here (at least for Reactor Debug Agent users). Up to you :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added four commits, as described in the comments :)
} | ||
|
||
@Nullable | ||
private String getNextLine() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look at this, and went down the rabbit hole. The TL;DR is that I added three commits that further improve performance:
- One that avoids
String#join
, unrelated to your suggestion here. - One that replaces
String#trim
, such that the original string's underlyingchar[]
is always reused, thanks to the implementation ofString#substring
. This is IIUC an alternative to your suggestion to useCharBuffer
; I couldn't use the latter, as it lacks operations such as.startsWith
andindexOf
. - One that introduces a custom
Substring
class and is "somehow" even more performant.
Benchmark of the code on `main`
Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units
TracesBenchmark.measureThroughput 0 0 avgt 5 91.389 ± 1.255 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 8765.340 ± 120.723 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 840.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 119.000 counts
TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 179.000 ms
TracesBenchmark.measureThroughput 0 1 avgt 5 122.102 ± 3.312 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 1 avgt 5 8060.272 ± 216.623 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1 avgt 5 1032.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 1 avgt 5 109.000 counts
TracesBenchmark.measureThroughput:gc.time 0 1 avgt 5 158.000 ms
TracesBenchmark.measureThroughput 0 2 avgt 5 148.867 ± 2.750 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 2 avgt 5 7892.060 ± 145.950 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 2 avgt 5 1232.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 2 avgt 5 107.000 counts
TracesBenchmark.measureThroughput:gc.time 0 2 avgt 5 156.000 ms
TracesBenchmark.measureThroughput 1 0 avgt 5 122.145 ± 1.654 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 0 avgt 5 8057.222 ± 109.248 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 0 avgt 5 1032.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 0 avgt 5 109.000 counts
TracesBenchmark.measureThroughput:gc.time 1 0 avgt 5 165.000 ms
TracesBenchmark.measureThroughput 1 1 avgt 5 145.273 ± 0.361 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 1 avgt 5 8087.325 ± 20.159 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 1 avgt 5 1232.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 1 avgt 5 110.000 counts
TracesBenchmark.measureThroughput:gc.time 1 1 avgt 5 158.000 ms
TracesBenchmark.measureThroughput 1 2 avgt 5 177.756 ± 2.771 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 2 avgt 5 7639.569 ± 119.523 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 2 avgt 5 1424.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 2 avgt 5 104.000 counts
TracesBenchmark.measureThroughput:gc.time 1 2 avgt 5 152.000 ms
TracesBenchmark.measureThroughput 2 0 avgt 5 147.853 ± 2.101 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 0 avgt 5 8049.391 ± 114.613 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 0 avgt 5 1248.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 0 avgt 5 109.000 counts
TracesBenchmark.measureThroughput:gc.time 2 0 avgt 5 151.000 ms
TracesBenchmark.measureThroughput 2 1 avgt 5 176.972 ± 2.701 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 1 avgt 5 7759.568 ± 118.381 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 1 avgt 5 1440.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 1 avgt 5 106.000 counts
TracesBenchmark.measureThroughput:gc.time 2 1 avgt 5 154.000 ms
TracesBenchmark.measureThroughput 2 2 avgt 5 199.950 ± 0.792 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 2 avgt 5 8203.243 ± 32.542 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 2 avgt 5 1720.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 2 avgt 5 111.000 counts
TracesBenchmark.measureThroughput:gc.time 2 2 avgt 5 162.000 ms
Benchmark of the already-reviewed code
Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units
TracesBenchmark.measureThroughput 0 0 avgt 5 27.709 ± 1.321 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 12390.947 ± 590.685 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 360.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 168.000 counts
TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 226.000 ms
TracesBenchmark.measureThroughput 0 1 avgt 5 47.622 ± 1.357 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 1 avgt 5 11534.798 ± 330.546 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1 avgt 5 576.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 1 avgt 5 157.000 counts
TracesBenchmark.measureThroughput:gc.time 0 1 avgt 5 217.000 ms
TracesBenchmark.measureThroughput 0 2 avgt 5 46.575 ± 2.868 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 2 avgt 5 11796.026 ± 725.031 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 2 avgt 5 576.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 2 avgt 5 160.000 counts
TracesBenchmark.measureThroughput:gc.time 0 2 avgt 5 222.000 ms
TracesBenchmark.measureThroughput 1 0 avgt 5 48.489 ± 3.267 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 0 avgt 5 11330.727 ± 758.029 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 0 avgt 5 576.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 0 avgt 5 154.000 counts
TracesBenchmark.measureThroughput:gc.time 1 0 avgt 5 208.000 ms
TracesBenchmark.measureThroughput 1 1 avgt 5 46.214 ± 2.285 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 1 avgt 5 11887.215 ± 584.466 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 1 avgt 5 576.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 1 avgt 5 162.000 counts
TracesBenchmark.measureThroughput:gc.time 1 1 avgt 5 212.000 ms
TracesBenchmark.measureThroughput 1 2 avgt 5 46.943 ± 2.083 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 2 avgt 5 11702.329 ± 513.790 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 2 avgt 5 576.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 2 avgt 5 159.000 counts
TracesBenchmark.measureThroughput:gc.time 1 2 avgt 5 230.000 ms
TracesBenchmark.measureThroughput 2 0 avgt 5 48.431 ± 1.966 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 0 avgt 5 10712.378 ± 436.213 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 0 avgt 5 544.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 0 avgt 5 146.000 counts
TracesBenchmark.measureThroughput:gc.time 2 0 avgt 5 210.000 ms
TracesBenchmark.measureThroughput 2 1 avgt 5 46.974 ± 3.057 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 1 avgt 5 11046.458 ± 731.622 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 1 avgt 5 544.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 1 avgt 5 151.000 counts
TracesBenchmark.measureThroughput:gc.time 2 1 avgt 5 210.000 ms
TracesBenchmark.measureThroughput 2 2 avgt 5 50.708 ± 0.980 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 2 avgt 5 10230.724 ± 197.935 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 2 avgt 5 544.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 2 avgt 5 139.000 counts
TracesBenchmark.measureThroughput:gc.time 2 2 avgt 5 187.000 ms
Benchmark after the first improvement
Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units
TracesBenchmark.measureThroughput 0 0 avgt 5 21.666 ± 0.360 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 10563.716 ± 175.257 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 240.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 144.000 counts
TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 212.000 ms
TracesBenchmark.measureThroughput 0 1 avgt 5 36.057 ± 1.438 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 1 avgt 5 10791.558 ± 429.161 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1 avgt 5 408.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 1 avgt 5 146.000 counts
TracesBenchmark.measureThroughput:gc.time 0 1 avgt 5 206.000 ms
TracesBenchmark.measureThroughput 0 2 avgt 5 35.516 ± 1.274 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 2 avgt 5 10955.809 ± 391.095 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 2 avgt 5 408.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 2 avgt 5 149.000 counts
TracesBenchmark.measureThroughput:gc.time 0 2 avgt 5 209.000 ms
TracesBenchmark.measureThroughput 1 0 avgt 5 37.723 ± 1.266 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 0 avgt 5 10719.145 ± 360.313 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 0 avgt 5 424.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 0 avgt 5 145.000 counts
TracesBenchmark.measureThroughput:gc.time 1 0 avgt 5 201.000 ms
TracesBenchmark.measureThroughput 1 1 avgt 5 35.600 ± 2.316 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 1 avgt 5 11360.236 ± 738.477 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 1 avgt 5 424.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 1 avgt 5 154.000 counts
TracesBenchmark.measureThroughput:gc.time 1 1 avgt 5 221.000 ms
TracesBenchmark.measureThroughput 1 2 avgt 5 36.167 ± 1.605 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 2 avgt 5 11180.857 ± 492.758 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 2 avgt 5 424.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 2 avgt 5 152.000 counts
TracesBenchmark.measureThroughput:gc.time 1 2 avgt 5 213.000 ms
TracesBenchmark.measureThroughput 2 0 avgt 5 34.945 ± 0.914 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 0 avgt 5 10697.778 ± 280.941 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 0 avgt 5 392.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 0 avgt 5 145.000 counts
TracesBenchmark.measureThroughput:gc.time 2 0 avgt 5 210.000 ms
TracesBenchmark.measureThroughput 2 1 avgt 5 35.215 ± 1.076 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 1 avgt 5 10615.784 ± 325.177 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 1 avgt 5 392.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 1 avgt 5 144.000 counts
TracesBenchmark.measureThroughput:gc.time 2 1 avgt 5 198.000 ms
TracesBenchmark.measureThroughput 2 2 avgt 5 35.754 ± 1.788 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 2 avgt 5 10456.676 ± 524.036 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 2 avgt 5 392.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 2 avgt 5 142.000 counts
TracesBenchmark.measureThroughput:gc.time 2 2 avgt 5 200.000 ms
Benchmark after the second improvement
Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units
TracesBenchmark.measureThroughput 0 0 avgt 5 16.695 ± 0.223 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 6854.558 ± 91.285 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 120.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 94.000 counts
TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 130.000 ms
TracesBenchmark.measureThroughput 0 1 avgt 5 27.538 ± 2.173 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 1 avgt 5 7482.351 ± 582.492 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1 avgt 5 216.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 1 avgt 5 102.000 counts
TracesBenchmark.measureThroughput:gc.time 0 1 avgt 5 143.000 ms
TracesBenchmark.measureThroughput 0 2 avgt 5 26.589 ± 1.541 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 2 avgt 5 7748.273 ± 448.220 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 2 avgt 5 216.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 2 avgt 5 106.000 counts
TracesBenchmark.measureThroughput:gc.time 0 2 avgt 5 154.000 ms
TracesBenchmark.measureThroughput 1 0 avgt 5 28.482 ± 0.514 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 0 avgt 5 7500.140 ± 135.789 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 0 avgt 5 224.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 0 avgt 5 102.000 counts
TracesBenchmark.measureThroughput:gc.time 1 0 avgt 5 150.000 ms
TracesBenchmark.measureThroughput 1 1 avgt 5 26.180 ± 0.323 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 1 avgt 5 8159.420 ± 101.176 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 1 avgt 5 224.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 1 avgt 5 111.000 counts
TracesBenchmark.measureThroughput:gc.time 1 1 avgt 5 163.000 ms
TracesBenchmark.measureThroughput 1 2 avgt 5 26.194 ± 0.395 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 2 avgt 5 8155.085 ± 122.896 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 2 avgt 5 224.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 2 avgt 5 111.000 counts
TracesBenchmark.measureThroughput:gc.time 1 2 avgt 5 157.000 ms
TracesBenchmark.measureThroughput 2 0 avgt 5 26.485 ± 0.255 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 0 avgt 5 7489.392 ± 72.106 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 0 avgt 5 208.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 0 avgt 5 102.000 counts
TracesBenchmark.measureThroughput:gc.time 2 0 avgt 5 146.000 ms
TracesBenchmark.measureThroughput 2 1 avgt 5 26.530 ± 0.298 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 1 avgt 5 7476.572 ± 83.959 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 1 avgt 5 208.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 1 avgt 5 102.000 counts
TracesBenchmark.measureThroughput:gc.time 2 1 avgt 5 149.000 ms
TracesBenchmark.measureThroughput 2 2 avgt 5 26.710 ± 0.566 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 2 avgt 5 7426.339 ± 157.301 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 2 avgt 5 208.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 2 avgt 5 101.000 counts
TracesBenchmark.measureThroughput:gc.time 2 2 avgt 5 146.000 ms
Benchmark after the third improvement
Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units
TracesBenchmark.measureThroughput 0 0 avgt 5 18.096 ± 0.304 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 7588.502 ± 127.338 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 144.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 103.000 counts
TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 145.000 ms
TracesBenchmark.measureThroughput 0 1 avgt 5 22.850 ± 0.378 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 1 avgt 5 5008.182 ± 83.316 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1 avgt 5 120.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 1 avgt 5 68.000 counts
TracesBenchmark.measureThroughput:gc.time 0 1 avgt 5 98.000 ms
TracesBenchmark.measureThroughput 0 2 avgt 5 20.251 ± 0.517 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 2 avgt 5 5651.105 ± 144.835 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 2 avgt 5 120.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 2 avgt 5 77.000 counts
TracesBenchmark.measureThroughput:gc.time 0 2 avgt 5 114.000 ms
TracesBenchmark.measureThroughput 1 0 avgt 5 23.276 ± 0.700 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 0 avgt 5 4261.171 ± 128.182 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 0 avgt 5 104.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 0 avgt 5 58.000 counts
TracesBenchmark.measureThroughput:gc.time 1 0 avgt 5 79.000 ms
TracesBenchmark.measureThroughput 1 1 avgt 5 19.319 ± 0.161 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 1 avgt 5 5133.773 ± 43.098 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 1 avgt 5 104.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 1 avgt 5 70.000 counts
TracesBenchmark.measureThroughput:gc.time 1 1 avgt 5 96.000 ms
TracesBenchmark.measureThroughput 1 2 avgt 5 19.673 ± 0.342 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1 2 avgt 5 5041.454 ± 87.186 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1 2 avgt 5 104.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1 2 avgt 5 68.000 counts
TracesBenchmark.measureThroughput:gc.time 1 2 avgt 5 89.000 ms
TracesBenchmark.measureThroughput 2 0 avgt 5 18.855 ± 0.652 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 0 avgt 5 5260.401 ± 184.048 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 0 avgt 5 104.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 0 avgt 5 72.000 counts
TracesBenchmark.measureThroughput:gc.time 2 0 avgt 5 99.000 ms
TracesBenchmark.measureThroughput 2 1 avgt 5 18.931 ± 0.206 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 1 avgt 5 5238.827 ± 56.895 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 1 avgt 5 104.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 1 avgt 5 71.000 counts
TracesBenchmark.measureThroughput:gc.time 2 1 avgt 5 103.000 ms
TracesBenchmark.measureThroughput 2 2 avgt 5 18.723 ± 0.983 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 2 2 avgt 5 5297.753 ± 279.570 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 2 2 avgt 5 104.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 2 2 avgt 5 72.000 counts
TracesBenchmark.measureThroughput:gc.time 2 2 avgt 5 100.000 ms
So for the common (1, 1) case, we see the following timing and memory usage differences:
Variant Speed Normalized garbage
============= ============================ ==========================
Baseline 145.273 ± 0.361 ns/op 1232.000 ± 0.001 B/op
Reviewed code 46.214 ± 2.285 ns/op (-68%) 576.000 ± 0.001 B/op (-53%)
Speedup 1 35.600 ± 2.316 ns/op (-23%) 424.000 ± 0.001 B/op (-26%)
Speedup 2 26.180 ± 0.323 ns/op (-26%) 224.000 ± 0.001 B/op (-47%)
Speedup 3 19.319 ± 0.161 ns/op (-26%) 104.000 ± 0.001 B/op (-53%)
I can see how "Speedup 3" is controversial from a maintainability point of view. I guess the only way to justify it, is by realizing that we're dealing with a very hot codepath here (at least for Reactor Debug Agent users). Up to you :)
NB: One further speed-up could be to avoid the string trimming altogether: IIUC the whitespace is introduced only in these places:
That last one is part of the Reactor Debug Agent, shipped as a separate JAR. So the question is whether we can stop trimming before the next major release. (Unless users are required to use Trimming of trailing whitespace can already be dropped (it's never introduced), though that would require updating the unit tests. Happy to do in this or another PR; just let me know. |
} | ||
|
||
boolean startsWith(String prefix) { | ||
return str.startsWith(prefix, start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For our specific use case it probably won't be a problem but in general this is incomplete as the end
index is not considered. Perhaps a simple check for start + prefix.length() < end
is also required to make it correct?
With that, I'd argue a bunch of unit tests for this inner class would be helpful. It can be made package private and we can test it in TracesTest.java
.
} | ||
|
||
// XXX: Explain. | ||
private static final class Substring { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider renaming it to StringView
. This would communicate to the reader that we are only wrapping the underlying String
and not making any copies of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even StackLineView
, since it exposes methods specific to the lines in the stack trace.
|
||
// XXX: Explain. | ||
private static final class Substring { | ||
private final String str; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider renaming str
to underlying
, actual
, or backingString
to indicate its nature.
@Stephan202 the numbers you show look excellent. I think it's worth integrating the current proposal, since in the (1, 1) case you get almost ~7.5x speedup and 12x less memory pressure! Regarding avoiding the trimming - I think we can stop here. With the above result, changing the behaviour is not justified I believe. I suspect the tabs are useful in some form when printing the stack trace and are only trimmed when finding the "trace-back", right? Anyways, if you want to spend more time on it and show we're not breaking anything, we can discuss that in another issue/PR and aim for closing this one so we can release it soon :) Please add a unit test, potentially apply some refactoring suggestions and add a comment in place of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll merge and follow up applying the most recent feedback. I'd like for this optimization to be part of the upcoming release.
Thanks a lot @Stephan202, it has been a true pleasure to work with your contributions. Please do come back with more ideas!
Thanks for the thoughtful code review and thanks for filing #3949! I too am curious to see how this fares in production. If I have details to share, I'll report back here :) (As for avoiding the trimming: if I have spare cycles in the future I don't mind having a closer look at that; TBD!) |
Scannable#name()
and related logicFollow-up to #3901: * Renamed `Substring` to `StackLineView` * Implemented tests for `StackLineView` * Corrected `contains` and `startsWith` implementations
@Stephan202 I was curious if you had a chance to deploy the newest version? |
Hey @chemicL! Sorry for not following up. A colleague did test this change in isolation in production, but unfortunately no clear impact was measured. TBH, slightly surprising and disappointing. We're using Datadog in production, but due to Reactor's deep call stacks, DD's JFR-based profiling functionality can't correlate CPU and memory usage with some of our hottest reactive code. This makes it hard to do a more fine-grained before- and after comparison. (We currently run with One caveat is that I didn't find time to do a deeper analysis myself, and now with Christmas coming up there's no appetite to run more experiments (e.g. by doing a temporary downgrade) in production 😬. |
@Stephan202 thanks for the heads up. I also took a longer break now, hence the delayed response :-) Sorry to hear there was no clear improvement. Perhaps Amdahl's Law strikes again. Anyways, thanks for the effort. I am certain some workloads will see an improvement from this! |
Performance is improved in two ways:
Scannable#stepName()
only whenAttr.NAME
is notexplicitly set.
Traces#extractOperatorAssemblyInformationParts
, towhich several
Scannable#stepName()
implementations delegate.The
Scannable#name()
logic may be executed many times, e.g. if a hotcode path uses
{Mono,Flux}#log
or Micrometer instrumentation. Theadded benchmark shows that for large stack traces, the new
Traces
implementation is several orders of magnitude more efficient in terms of
compute and memory resource utilization.
Deferral of invocation of
Scannable#stepName()
assumes that saidmethod does not have side-effects. This is true for all built-in
implementations.
While there:
benchmark method return values are implicitly sent.
reactor.core.publisher
package matching to prefix matching.