Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize observability and debugging experience #3901

Conversation

Stephan202
Copy link
Contributor

@Stephan202 Stephan202 commented Oct 3, 2024

Performance is improved in two ways:

  • By invoking Scannable#stepName() only when Attr.NAME is not
    explicitly set.
  • By optimizing Traces#extractOperatorAssemblyInformationParts, to
    which several Scannable#stepName() implementations delegate.

The Scannable#name() logic may be executed many times, e.g. if a hot
code path uses {Mono,Flux}#log or Micrometer instrumentation. The
added benchmark shows that for large stack traces, the new Traces
implementation is several orders of magnitude more efficient in terms of
compute and memory resource utilization.

Deferral of invocation of Scannable#stepName() assumes that said
method does not have side-effects. This is true for all built-in
implementations.

While there:

  • Improve two existing benchmarks by utilizing the black hole to which
    benchmark method return values are implicitly sent.
  • Unify reactor.core.publisher package matching to prefix matching.

@Stephan202 Stephan202 requested a review from a team as a code owner October 3, 2024 10:34
Copy link
Contributor Author

@Stephan202 Stephan202 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments with context. This PR relates to #3900.

Comment on lines 59 to 60
|| stackTraceRow.startsWith("reactor.core.publisher.Mono.onAssembly")
|| stackTraceRow.equals("reactor.core.publisher.Mono.onAssembly")
|| stackTraceRow.equals("reactor.core.publisher.Flux.onAssembly")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line removal is unrelated to the rest of the PR. The dropped line is redundant, as it is preceded by a predicate that matches in strictly more contexts. Question: should the Flux.onAssembly line (and some others below) be updated to also use startsWith?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they can all use startsWith instead of equals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, will do (and update the PR description to match this additional change).

Comment on lines 130 to 131
static String[] extractOperatorAssemblyInformationParts(String source) {
String[] uncleanTraces = source.split("\n");
final List<String> traces = Stream.of(uncleanTraces)
.map(String::trim)
.filter(s -> !s.isEmpty())
.collect(Collectors.toList());
Iterator<String> traces = trimmedNonemptyLines(source);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The largest improvements in this PR come from these changes.

Key contributors to performance of the old implementation:

  • String#split accepts a regular expression. We're no longer performing the comparatively expensive operation of compiling regular expressions.
  • String#split allocates an array and substrings proportional to the provided input, covering a potentially large part of the input that does not at all influence the result of this method.
  • The Stream operation likewise processes irrelevant lines, and allocates a potentially large list.

The new implementation instead lazily iterates over the input, processing only relevant lines, and tracking only the two most-recently-seen lines.

Comment on lines 140 to 143
if (isUserCode(currentLine)) {
// No line is a Reactor API line.
return new String[]{currentLine};
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some logic was moved around, but existing comments were relocated with it. This should aid review. It's nice that there was already good test coverage.

Comment on lines 172 to 178
/**
* Returns an iterator over all trimmed non-empty lines in the given source string.
*
* @implNote This implementation attempts to minimize allocations.
*/
private static Iterator<String> trimmedNonemptyLines(String source) {
return new Iterator<String>() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This manually-crafted iterator feels a bit like "coding like it's 1999", but I didn't find a less-verbose alternative that doesn't impact over-all code readability. (Had Guava been on the classpath, then I'd have opted to extend AbstractIterator.) Open to suggestions!

@chemicL
Copy link
Member

chemicL commented Oct 4, 2024

Thanks for the PR. I will have a closer look. However, bear in mind this change won't make it into 3.7.0-RC1 but would target directly 3.7.0. I think that's ok as it's not an API change. Together with #3900 it is a behaviour change, but unless there were side effects in code they would also not be observed aside from the performance gains. I think it's worth noting some sort of warning for the stepName lazy evaluation when we do release notes.

@chemicL chemicL added the area/performance This belongs to the performance theme label Oct 4, 2024
@Stephan202 Stephan202 changed the title Optimize Traces#extractOperatorAssemblyInformationParts Optimize Scannable#name() and related logic Oct 4, 2024
@Stephan202
Copy link
Contributor Author

Thanks for the feedback @chemicL! I cherry-picked the commit from #3900 into this branch and rewrote the PR title and summary.

@Stephan202
Copy link
Contributor Author

@chemicL I just realized that the impact of this PR is likely over-stated, as in practice the input stacktrace appears to always be generated by a CallSiteSupplierFactory implementation, both of which output at most two lines. (In my defense: the unit tests of the modified code also seem to indicate that more lines may be expected.)

So perhaps we can optimize the code further by skipping the intermediate single-string representation. I might have time for a closer look into that this weekend.

@Stephan202
Copy link
Contributor Author

Stephan202 commented Oct 7, 2024

as in practice the input stacktrace appears to always be generated by a CallSiteSupplierFactory implementation, both of which output at most two lines.

There's one other code path: CallSiteInfoAddingMethodVisitor passes a manually constructed two-line stack trace to Hooks#addCallSiteInfo.

So perhaps we can optimize the code further by skipping the intermediate single-string representation. I might have time for a closer look into that this weekend.

I now have a POC for this locally. Against a (ReactorDebugAgent-using) benchmark of local code it isn't yet faster that the current PR (surprisingly; requires more investigation), but for the cleanest and likely fastest code, it'd be better if we can have CallSiteInfoAddingMethodVisitor pass the two constructed stack frames separately. Doing this requires adding (or modifying) a public Hooks method, which causes :reactor-core:japicmp to report an API compatibility failure. Is that acceptable, and if so, how can I make that change without failing the build?

@chemicL
Copy link
Member

chemicL commented Nov 21, 2024

Hey, @Stephan202. I've been a bit busy lately but would like to revisit your PRs. Can you give an update on the above considerations? How impactful is this change or an alternative change from your PoC? I assume we would be able to alter the Hooks methods that are marked as deprecated with a huge warning they're for internal use. You only need to find japicmp configuration in build.gradle and add an entry to the methodExcludes = [] array.

@Stephan202
Copy link
Contributor Author

Thanks for the ping on this PR @chemicL; I meant to report back here. :shame:

I did try the alternative approach mentioned (including the Hooks customization), but testing it against an internal benchmark (one that contains some Picnic-specific code, but mostly causes Reactor logic to be executed, with Reactor Debug Agent enabled), I consistently found the alternative approach be perform slightly worse. I lost quite some time over that, as it really defied (and still defies) my intuitions.

I can look into polishing that code and pushing it to an alternative branch for a second opinion; will try to find some time. That said, based on the above, my tentative suggestion would be to proceed with this PR as-is. (Except perhaps for trimming the JMH benchmark inputs, because as mentioned, in practice the code will generally parse at most two stack frames, rather than 1000.) If the alternative approach can be made more performant after all, that can be tackled in a follow-up PR.

@Stephan202 Stephan202 force-pushed the sschroevers/traces-performance-improvement branch from f7ebe1c to f470638 Compare November 21, 2024 17:52
@Stephan202
Copy link
Contributor Author

Rebased branch; applied 100% cleanly.

@Stephan202
Copy link
Contributor Author

The experiments I tried are on this messy branch. If desired I can clean it up, though it's a bit TBD when I'll have time to dive back into this topic.

@chemicL
Copy link
Member

chemicL commented Nov 25, 2024

@Stephan202 I am just trying to understand whether the change is actually needed. As I understand you discovered that this will only be triggered for processing two stack frames (can you point to where it's limited to only 2?), therefore the benchmark is not relevant to the expected usage of this API, yes? And also, there is the risk of touching and changing a stable code base for not much benefit and potential regressions. Is there a possibility that these optimizations will have an actual effect on real world applications?

@Stephan202
Copy link
Contributor Author

@chemicL fair question! Based on earlier testing the answer is "yes, this is an improvement", but let me get back to you in the coming days with some more hard data. (Exact timing TBD.)

This logic may be executed many times, e.g. if a hot code path uses
`{Mono,Flux}#log` or Micrometer instrumentation. The added benchmark
shows that for large stack traces the new implementation is several
orders of magnitude more efficient in terms of compute and memory
resource utilization.

While there, improve two existing benchmarks by utilizing the black hole
to which benchmark method return values are implicitly sent.
(cherry picked from commit 009ec89)
@Stephan202 Stephan202 force-pushed the sschroevers/traces-performance-improvement branch from f470638 to c3e519a Compare November 27, 2024 21:49
@Stephan202
Copy link
Contributor Author

@chemicL alright, I rebased the branch on main and added a small commit to make the benchmark more realistic. In the remainder of this post I'm comparing this branch to the current HEAD of main (7cc701c), such that improvements of #3902 apply in each case / don't bias the result.

For the TracesBenchmark in this PR, I locally get the following results:

Benchmark before the changes
Benchmark                                             (reactorLeadingLines)  (trailingLines)  Mode  Cnt     Score     Error   Units
TracesBenchmark.measureThroughput                                         0                0  avgt    5    93.857 ±   2.071   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                0  avgt    5  8859.998 ± 195.529  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                0  avgt    5   872.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                0  avgt    5   121.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                0  avgt    5   155.000                ms
TracesBenchmark.measureThroughput                                         0                1  avgt    5   122.244 ±   3.167   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                1  avgt    5  8050.975 ± 209.703  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                1  avgt    5  1032.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                1  avgt    5   109.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                1  avgt    5   150.000                ms
TracesBenchmark.measureThroughput                                         0                2  avgt    5   143.698 ±   1.597   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                2  avgt    5  8175.946 ±  91.208  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                2  avgt    5  1232.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                2  avgt    5   111.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                2  avgt    5   169.000                ms
TracesBenchmark.measureThroughput                                         1                0  avgt    5   121.232 ±   2.857   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                0  avgt    5  8117.946 ± 190.510  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                0  avgt    5  1032.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                0  avgt    5   110.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                0  avgt    5   162.000                ms
TracesBenchmark.measureThroughput                                         1                1  avgt    5   155.369 ±   3.385   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                1  avgt    5  7561.889 ± 165.250  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                1  avgt    5  1232.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                1  avgt    5   103.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                1  avgt    5   160.000                ms
TracesBenchmark.measureThroughput                                         1                2  avgt    5   166.787 ±   3.910   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                2  avgt    5  8142.041 ± 191.441  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                2  avgt    5  1424.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                2  avgt    5   111.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                2  avgt    5   166.000                ms
TracesBenchmark.measureThroughput                                         2                0  avgt    5   144.471 ±   2.355   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                0  avgt    5  8237.969 ± 134.557  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                0  avgt    5  1248.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                0  avgt    5   112.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                0  avgt    5   164.000                ms
TracesBenchmark.measureThroughput                                         2                1  avgt    5   166.155 ±   1.974   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                1  avgt    5  8264.689 ±  98.186  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                1  avgt    5  1440.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                1  avgt    5   112.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                1  avgt    5   147.000                ms
TracesBenchmark.measureThroughput                                         2                2  avgt    5   197.339 ±   3.674   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                2  avgt    5  7925.318 ± 147.817  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                2  avgt    5  1640.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                2  avgt    5   108.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                2  avgt    5   146.000                ms
Benchmark after the changes
Benchmark                                             (reactorLeadingLines)  (trailingLines)  Mode  Cnt      Score     Error   Units
TracesBenchmark.measureThroughput                                         0                0  avgt    5     27.853 ±   1.429   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                0  avgt    5  12327.398 ± 629.891  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                0  avgt    5    360.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                0  avgt    5    168.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                0  avgt    5    251.000                ms
TracesBenchmark.measureThroughput                                         0                1  avgt    5     47.004 ±   3.899   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                1  avgt    5  11690.405 ± 971.380  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                1  avgt    5    576.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                1  avgt    5    159.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                1  avgt    5    216.000                ms
TracesBenchmark.measureThroughput                                         0                2  avgt    5     46.116 ±   3.718   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                2  avgt    5  11915.112 ± 943.275  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                2  avgt    5    576.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                2  avgt    5    162.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                2  avgt    5    209.000                ms
TracesBenchmark.measureThroughput                                         1                0  avgt    5     56.545 ±   1.064   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                0  avgt    5   9714.213 ± 182.537  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                0  avgt    5    576.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                0  avgt    5    133.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                0  avgt    5    196.000                ms
TracesBenchmark.measureThroughput                                         1                1  avgt    5     45.934 ±   2.003   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                1  avgt    5  11959.417 ± 520.615  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                1  avgt    5    576.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                1  avgt    5    163.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                1  avgt    5    221.000                ms
TracesBenchmark.measureThroughput                                         1                2  avgt    5     49.027 ±   3.465   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                2  avgt    5  11206.856 ± 785.266  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                2  avgt    5    576.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                2  avgt    5    153.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                2  avgt    5    214.000                ms
TracesBenchmark.measureThroughput                                         2                0  avgt    5     47.588 ±   2.095   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                0  avgt    5  10902.505 ± 477.988  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                0  avgt    5    544.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                0  avgt    5    148.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                0  avgt    5    200.000                ms
TracesBenchmark.measureThroughput                                         2                1  avgt    5     47.097 ±   2.317   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                1  avgt    5  11016.262 ± 539.726  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                1  avgt    5    544.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                1  avgt    5    149.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                1  avgt    5    182.000                ms
TracesBenchmark.measureThroughput                                         2                2  avgt    5     45.345 ±   2.225   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                2  avgt    5  11441.877 ± 557.287  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                2  avgt    5    544.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                2  avgt    5    156.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                2  avgt    5    214.000                ms

In short, this means a >3x speedup and more than halving of allocated memory for the most common 2-line case.

I also had another look at a "more representative" Picnic-internal benchmark, where some longer reactive chains are subscribed to. I can't easily share this code, but with the Reactor Debug Agent enabled (as we do in production) there's an 18-22% speedup and ~40% reduction in allocated memory:

Internal benchmark before the changes
Benchmark                                             (orderDepth)  (transformationStepCount)  Mode  Cnt       Score      Error   Units
TransformationBenchmark.transform                                5                         20  avgt    5     150.550 ±    2.812   us/op
TransformationBenchmark.transform:gc.alloc.rate                  5                         20  avgt    5    1830.314 ±   26.565  MB/sec
TransformationBenchmark.transform:gc.alloc.rate.norm             5                         20  avgt    5  288942.054 ± 1734.845    B/op
TransformationBenchmark.transform:gc.count                       5                         20  avgt    5     150.000             counts
TransformationBenchmark.transform:gc.time                        5                         20  avgt    5     184.000                 ms
Internal benchmark after the changes
Benchmark                                             (orderDepth)  (transformationStepCount)  Mode  Cnt       Score    Error   Units
TransformationBenchmark.transform                                5                         20  avgt    5     120.198 ±  0.804   us/op
TransformationBenchmark.transform:gc.alloc.rate                  5                         20  avgt    5    1358.328 ±  9.011  MB/sec
TransformationBenchmark.transform:gc.alloc.rate.norm             5                         20  avgt    5  171202.277 ± 18.866    B/op
TransformationBenchmark.transform:gc.count                       5                         20  avgt    5     111.000           counts
TransformationBenchmark.transform:gc.time                        5                         20  avgt    5     141.000               ms

Our main store application is a modular monolith that makes very heavy use of Reactor, with some key request flows creating very long reactive chains. I'm reasonably confident we'll see a noticeable latency improvement with this change.

@chemicL
Copy link
Member

chemicL commented Nov 28, 2024

Wow, that's really impressive @Stephan202 🥇
Thanks for collecting and sharing the results, I'll get to the review then 🚀

Copy link
Contributor Author

@Stephan202 Stephan202 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a commit. Tnx for the review @chemicL!

Comment on lines 59 to 60
|| stackTraceRow.startsWith("reactor.core.publisher.Mono.onAssembly")
|| stackTraceRow.equals("reactor.core.publisher.Mono.onAssembly")
|| stackTraceRow.equals("reactor.core.publisher.Flux.onAssembly")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, will do (and update the PR description to match this additional change).

Comment on lines 200 to 202
if (index >= source.length()) {
return null;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that we can drop this case 👁️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. I didn't finish reviewing yesterday and it seems we noticed the same thing :)

@Stephan202
Copy link
Contributor Author

The Java 11 tests failed, but ./gradlew :reactor-core:java11Test --no-daemon passes for me locally. Perhaps a flaky test?

@@ -29,6 +28,7 @@
* @author Sergei Egorov
*/
final class Traces {
private static final String PUBLISHER_PACKAGE_PREFIX = "reactor.core.publisher.";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the private constant below the package-private ones.

@chemicL
Copy link
Member

chemicL commented Nov 29, 2024

Thanks. I have some minor comments and we're good to go. The test failure is a flaky test indeed, unrelated to this change.

I think we can merge this since this is a great improvement. As a side discussion - have you considered using CharBuffer to avoid String allocations at all? Currently if there's more lines in the stack trace they will be parsed even if the previous line is eventually used due to the nature of the iterator. Were we able to avoid allocating another String this would be even faster. If you think there's room for improvements we can explore this further in another PR potentially.

}

@Nullable
private String getNextLine() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an outline of an idea to avoid using String and get the next line:

// Assume the entire input (String source) is wrapped in CharBuffer

CharBuffer cb = CharBuffer.wrap(source);

private CharBuffer getNextLine() {
		int i = 0;
		while (i < cb.length()) {
			if (Character.isWhitespace(cb.charAt(i))) continue;

			int end = i + 1;
			while (end < cb.length() && cb.charAt(end) != '\n') {
				end++;
			}
			
			CharBuffer line = cb.subSequence(i, end);
			i = end + 1;
			return line;
		}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The match for the reactor package name can also be done in the same linear scanning manner. I wonder if it'd be faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at this, and went down the rabbit hole. The TL;DR is that I added three commits that further improve performance:

  1. One that avoids String#join, unrelated to your suggestion here.
  2. One that replaces String#trim, such that the original string's underlying char[] is always reused, thanks to the implementation of String#substring. This is IIUC an alternative to your suggestion to use CharBuffer; I couldn't use the latter, as it lacks operations such as .startsWith and indexOf.
  3. One that introduces a custom Substring class and is "somehow" even more performant.
Benchmark of the code on `main`
Benchmark                                             (reactorLeadingLines)  (trailingLines)  Mode  Cnt     Score     Error   Units
TracesBenchmark.measureThroughput                                         0                0  avgt    5    91.389 ±   1.255   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                0  avgt    5  8765.340 ± 120.723  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                0  avgt    5   840.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                0  avgt    5   119.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                0  avgt    5   179.000                ms
TracesBenchmark.measureThroughput                                         0                1  avgt    5   122.102 ±   3.312   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                1  avgt    5  8060.272 ± 216.623  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                1  avgt    5  1032.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                1  avgt    5   109.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                1  avgt    5   158.000                ms
TracesBenchmark.measureThroughput                                         0                2  avgt    5   148.867 ±   2.750   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                2  avgt    5  7892.060 ± 145.950  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                2  avgt    5  1232.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                2  avgt    5   107.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                2  avgt    5   156.000                ms
TracesBenchmark.measureThroughput                                         1                0  avgt    5   122.145 ±   1.654   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                0  avgt    5  8057.222 ± 109.248  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                0  avgt    5  1032.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                0  avgt    5   109.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                0  avgt    5   165.000                ms
TracesBenchmark.measureThroughput                                         1                1  avgt    5   145.273 ±   0.361   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                1  avgt    5  8087.325 ±  20.159  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                1  avgt    5  1232.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                1  avgt    5   110.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                1  avgt    5   158.000                ms
TracesBenchmark.measureThroughput                                         1                2  avgt    5   177.756 ±   2.771   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                2  avgt    5  7639.569 ± 119.523  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                2  avgt    5  1424.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                2  avgt    5   104.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                2  avgt    5   152.000                ms
TracesBenchmark.measureThroughput                                         2                0  avgt    5   147.853 ±   2.101   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                0  avgt    5  8049.391 ± 114.613  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                0  avgt    5  1248.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                0  avgt    5   109.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                0  avgt    5   151.000                ms
TracesBenchmark.measureThroughput                                         2                1  avgt    5   176.972 ±   2.701   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                1  avgt    5  7759.568 ± 118.381  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                1  avgt    5  1440.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                1  avgt    5   106.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                1  avgt    5   154.000                ms
TracesBenchmark.measureThroughput                                         2                2  avgt    5   199.950 ±   0.792   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                2  avgt    5  8203.243 ±  32.542  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                2  avgt    5  1720.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                2  avgt    5   111.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                2  avgt    5   162.000                ms
Benchmark of the already-reviewed code
Benchmark                                             (reactorLeadingLines)  (trailingLines)  Mode  Cnt      Score     Error   Units
TracesBenchmark.measureThroughput                                         0                0  avgt    5     27.709 ±   1.321   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                0  avgt    5  12390.947 ± 590.685  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                0  avgt    5    360.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                0  avgt    5    168.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                0  avgt    5    226.000                ms
TracesBenchmark.measureThroughput                                         0                1  avgt    5     47.622 ±   1.357   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                1  avgt    5  11534.798 ± 330.546  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                1  avgt    5    576.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                1  avgt    5    157.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                1  avgt    5    217.000                ms
TracesBenchmark.measureThroughput                                         0                2  avgt    5     46.575 ±   2.868   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                2  avgt    5  11796.026 ± 725.031  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                2  avgt    5    576.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                2  avgt    5    160.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                2  avgt    5    222.000                ms
TracesBenchmark.measureThroughput                                         1                0  avgt    5     48.489 ±   3.267   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                0  avgt    5  11330.727 ± 758.029  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                0  avgt    5    576.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                0  avgt    5    154.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                0  avgt    5    208.000                ms
TracesBenchmark.measureThroughput                                         1                1  avgt    5     46.214 ±   2.285   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                1  avgt    5  11887.215 ± 584.466  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                1  avgt    5    576.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                1  avgt    5    162.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                1  avgt    5    212.000                ms
TracesBenchmark.measureThroughput                                         1                2  avgt    5     46.943 ±   2.083   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                2  avgt    5  11702.329 ± 513.790  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                2  avgt    5    576.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                2  avgt    5    159.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                2  avgt    5    230.000                ms
TracesBenchmark.measureThroughput                                         2                0  avgt    5     48.431 ±   1.966   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                0  avgt    5  10712.378 ± 436.213  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                0  avgt    5    544.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                0  avgt    5    146.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                0  avgt    5    210.000                ms
TracesBenchmark.measureThroughput                                         2                1  avgt    5     46.974 ±   3.057   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                1  avgt    5  11046.458 ± 731.622  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                1  avgt    5    544.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                1  avgt    5    151.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                1  avgt    5    210.000                ms
TracesBenchmark.measureThroughput                                         2                2  avgt    5     50.708 ±   0.980   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                2  avgt    5  10230.724 ± 197.935  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                2  avgt    5    544.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                2  avgt    5    139.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                2  avgt    5    187.000                ms
Benchmark after the first improvement
Benchmark                                             (reactorLeadingLines)  (trailingLines)  Mode  Cnt      Score     Error   Units
TracesBenchmark.measureThroughput                                         0                0  avgt    5     21.666 ±   0.360   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                0  avgt    5  10563.716 ± 175.257  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                0  avgt    5    240.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                0  avgt    5    144.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                0  avgt    5    212.000                ms
TracesBenchmark.measureThroughput                                         0                1  avgt    5     36.057 ±   1.438   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                1  avgt    5  10791.558 ± 429.161  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                1  avgt    5    408.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                1  avgt    5    146.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                1  avgt    5    206.000                ms
TracesBenchmark.measureThroughput                                         0                2  avgt    5     35.516 ±   1.274   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                2  avgt    5  10955.809 ± 391.095  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                2  avgt    5    408.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                2  avgt    5    149.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                2  avgt    5    209.000                ms
TracesBenchmark.measureThroughput                                         1                0  avgt    5     37.723 ±   1.266   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                0  avgt    5  10719.145 ± 360.313  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                0  avgt    5    424.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                0  avgt    5    145.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                0  avgt    5    201.000                ms
TracesBenchmark.measureThroughput                                         1                1  avgt    5     35.600 ±   2.316   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                1  avgt    5  11360.236 ± 738.477  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                1  avgt    5    424.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                1  avgt    5    154.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                1  avgt    5    221.000                ms
TracesBenchmark.measureThroughput                                         1                2  avgt    5     36.167 ±   1.605   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                2  avgt    5  11180.857 ± 492.758  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                2  avgt    5    424.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                2  avgt    5    152.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                2  avgt    5    213.000                ms
TracesBenchmark.measureThroughput                                         2                0  avgt    5     34.945 ±   0.914   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                0  avgt    5  10697.778 ± 280.941  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                0  avgt    5    392.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                0  avgt    5    145.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                0  avgt    5    210.000                ms
TracesBenchmark.measureThroughput                                         2                1  avgt    5     35.215 ±   1.076   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                1  avgt    5  10615.784 ± 325.177  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                1  avgt    5    392.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                1  avgt    5    144.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                1  avgt    5    198.000                ms
TracesBenchmark.measureThroughput                                         2                2  avgt    5     35.754 ±   1.788   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                2  avgt    5  10456.676 ± 524.036  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                2  avgt    5    392.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                2  avgt    5    142.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                2  avgt    5    200.000                ms
Benchmark after the second improvement
Benchmark                                             (reactorLeadingLines)  (trailingLines)  Mode  Cnt     Score     Error   Units
TracesBenchmark.measureThroughput                                         0                0  avgt    5    16.695 ±   0.223   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                0  avgt    5  6854.558 ±  91.285  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                0  avgt    5   120.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                0  avgt    5    94.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                0  avgt    5   130.000                ms
TracesBenchmark.measureThroughput                                         0                1  avgt    5    27.538 ±   2.173   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                1  avgt    5  7482.351 ± 582.492  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                1  avgt    5   216.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                1  avgt    5   102.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                1  avgt    5   143.000                ms
TracesBenchmark.measureThroughput                                         0                2  avgt    5    26.589 ±   1.541   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                2  avgt    5  7748.273 ± 448.220  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                2  avgt    5   216.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                2  avgt    5   106.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                2  avgt    5   154.000                ms
TracesBenchmark.measureThroughput                                         1                0  avgt    5    28.482 ±   0.514   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                0  avgt    5  7500.140 ± 135.789  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                0  avgt    5   224.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                0  avgt    5   102.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                0  avgt    5   150.000                ms
TracesBenchmark.measureThroughput                                         1                1  avgt    5    26.180 ±   0.323   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                1  avgt    5  8159.420 ± 101.176  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                1  avgt    5   224.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                1  avgt    5   111.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                1  avgt    5   163.000                ms
TracesBenchmark.measureThroughput                                         1                2  avgt    5    26.194 ±   0.395   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                2  avgt    5  8155.085 ± 122.896  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                2  avgt    5   224.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                2  avgt    5   111.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                2  avgt    5   157.000                ms
TracesBenchmark.measureThroughput                                         2                0  avgt    5    26.485 ±   0.255   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                0  avgt    5  7489.392 ±  72.106  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                0  avgt    5   208.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                0  avgt    5   102.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                0  avgt    5   146.000                ms
TracesBenchmark.measureThroughput                                         2                1  avgt    5    26.530 ±   0.298   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                1  avgt    5  7476.572 ±  83.959  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                1  avgt    5   208.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                1  avgt    5   102.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                1  avgt    5   149.000                ms
TracesBenchmark.measureThroughput                                         2                2  avgt    5    26.710 ±   0.566   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                2  avgt    5  7426.339 ± 157.301  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                2  avgt    5   208.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                2  avgt    5   101.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                2  avgt    5   146.000                ms
Benchmark after the third improvement
Benchmark                                             (reactorLeadingLines)  (trailingLines)  Mode  Cnt     Score     Error   Units
TracesBenchmark.measureThroughput                                         0                0  avgt    5    18.096 ±   0.304   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                0  avgt    5  7588.502 ± 127.338  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                0  avgt    5   144.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                0  avgt    5   103.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                0  avgt    5   145.000                ms
TracesBenchmark.measureThroughput                                         0                1  avgt    5    22.850 ±   0.378   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                1  avgt    5  5008.182 ±  83.316  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                1  avgt    5   120.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                1  avgt    5    68.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                1  avgt    5    98.000                ms
TracesBenchmark.measureThroughput                                         0                2  avgt    5    20.251 ±   0.517   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                2  avgt    5  5651.105 ± 144.835  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                2  avgt    5   120.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                2  avgt    5    77.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                2  avgt    5   114.000                ms
TracesBenchmark.measureThroughput                                         1                0  avgt    5    23.276 ±   0.700   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                0  avgt    5  4261.171 ± 128.182  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                0  avgt    5   104.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                0  avgt    5    58.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                0  avgt    5    79.000                ms
TracesBenchmark.measureThroughput                                         1                1  avgt    5    19.319 ±   0.161   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                1  avgt    5  5133.773 ±  43.098  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                1  avgt    5   104.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                1  avgt    5    70.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                1  avgt    5    96.000                ms
TracesBenchmark.measureThroughput                                         1                2  avgt    5    19.673 ±   0.342   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                2  avgt    5  5041.454 ±  87.186  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                2  avgt    5   104.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                2  avgt    5    68.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                2  avgt    5    89.000                ms
TracesBenchmark.measureThroughput                                         2                0  avgt    5    18.855 ±   0.652   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                0  avgt    5  5260.401 ± 184.048  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                0  avgt    5   104.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                0  avgt    5    72.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                0  avgt    5    99.000                ms
TracesBenchmark.measureThroughput                                         2                1  avgt    5    18.931 ±   0.206   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                1  avgt    5  5238.827 ±  56.895  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                1  avgt    5   104.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                1  avgt    5    71.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                1  avgt    5   103.000                ms
TracesBenchmark.measureThroughput                                         2                2  avgt    5    18.723 ±   0.983   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                2  avgt    5  5297.753 ± 279.570  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                2  avgt    5   104.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                2  avgt    5    72.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                2  avgt    5   100.000                ms

So for the common (1, 1) case, we see the following timing and memory usage differences:

Variant       Speed                           Normalized garbage
============= ============================    ==========================
Baseline      145.273 ± 0.361 ns/op           1232.000 ± 0.001 B/op
Reviewed code  46.214 ± 2.285 ns/op (-68%)     576.000 ± 0.001 B/op (-53%)
Speedup 1      35.600 ± 2.316 ns/op (-23%)     424.000 ± 0.001 B/op (-26%)
Speedup 2      26.180 ± 0.323 ns/op (-26%)     224.000 ± 0.001 B/op (-47%)
Speedup 3      19.319 ± 0.161 ns/op (-26%)     104.000 ± 0.001 B/op (-53%)

I can see how "Speedup 3" is controversial from a maintainability point of view. I guess the only way to justify it, is by realizing that we're dealing with a very hot codepath here (at least for Reactor Debug Agent users). Up to you :)

Copy link
Contributor Author

@Stephan202 Stephan202 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added four commits, as described in the comments :)

}

@Nullable
private String getNextLine() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at this, and went down the rabbit hole. The TL;DR is that I added three commits that further improve performance:

  1. One that avoids String#join, unrelated to your suggestion here.
  2. One that replaces String#trim, such that the original string's underlying char[] is always reused, thanks to the implementation of String#substring. This is IIUC an alternative to your suggestion to use CharBuffer; I couldn't use the latter, as it lacks operations such as .startsWith and indexOf.
  3. One that introduces a custom Substring class and is "somehow" even more performant.
Benchmark of the code on `main`
Benchmark                                             (reactorLeadingLines)  (trailingLines)  Mode  Cnt     Score     Error   Units
TracesBenchmark.measureThroughput                                         0                0  avgt    5    91.389 ±   1.255   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                0  avgt    5  8765.340 ± 120.723  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                0  avgt    5   840.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                0  avgt    5   119.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                0  avgt    5   179.000                ms
TracesBenchmark.measureThroughput                                         0                1  avgt    5   122.102 ±   3.312   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                1  avgt    5  8060.272 ± 216.623  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                1  avgt    5  1032.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                1  avgt    5   109.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                1  avgt    5   158.000                ms
TracesBenchmark.measureThroughput                                         0                2  avgt    5   148.867 ±   2.750   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                2  avgt    5  7892.060 ± 145.950  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                2  avgt    5  1232.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                2  avgt    5   107.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                2  avgt    5   156.000                ms
TracesBenchmark.measureThroughput                                         1                0  avgt    5   122.145 ±   1.654   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                0  avgt    5  8057.222 ± 109.248  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                0  avgt    5  1032.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                0  avgt    5   109.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                0  avgt    5   165.000                ms
TracesBenchmark.measureThroughput                                         1                1  avgt    5   145.273 ±   0.361   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                1  avgt    5  8087.325 ±  20.159  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                1  avgt    5  1232.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                1  avgt    5   110.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                1  avgt    5   158.000                ms
TracesBenchmark.measureThroughput                                         1                2  avgt    5   177.756 ±   2.771   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                2  avgt    5  7639.569 ± 119.523  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                2  avgt    5  1424.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                2  avgt    5   104.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                2  avgt    5   152.000                ms
TracesBenchmark.measureThroughput                                         2                0  avgt    5   147.853 ±   2.101   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                0  avgt    5  8049.391 ± 114.613  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                0  avgt    5  1248.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                0  avgt    5   109.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                0  avgt    5   151.000                ms
TracesBenchmark.measureThroughput                                         2                1  avgt    5   176.972 ±   2.701   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                1  avgt    5  7759.568 ± 118.381  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                1  avgt    5  1440.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                1  avgt    5   106.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                1  avgt    5   154.000                ms
TracesBenchmark.measureThroughput                                         2                2  avgt    5   199.950 ±   0.792   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                2  avgt    5  8203.243 ±  32.542  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                2  avgt    5  1720.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                2  avgt    5   111.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                2  avgt    5   162.000                ms
Benchmark of the already-reviewed code
Benchmark                                             (reactorLeadingLines)  (trailingLines)  Mode  Cnt      Score     Error   Units
TracesBenchmark.measureThroughput                                         0                0  avgt    5     27.709 ±   1.321   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                0  avgt    5  12390.947 ± 590.685  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                0  avgt    5    360.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                0  avgt    5    168.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                0  avgt    5    226.000                ms
TracesBenchmark.measureThroughput                                         0                1  avgt    5     47.622 ±   1.357   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                1  avgt    5  11534.798 ± 330.546  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                1  avgt    5    576.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                1  avgt    5    157.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                1  avgt    5    217.000                ms
TracesBenchmark.measureThroughput                                         0                2  avgt    5     46.575 ±   2.868   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                2  avgt    5  11796.026 ± 725.031  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                2  avgt    5    576.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                2  avgt    5    160.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                2  avgt    5    222.000                ms
TracesBenchmark.measureThroughput                                         1                0  avgt    5     48.489 ±   3.267   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                0  avgt    5  11330.727 ± 758.029  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                0  avgt    5    576.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                0  avgt    5    154.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                0  avgt    5    208.000                ms
TracesBenchmark.measureThroughput                                         1                1  avgt    5     46.214 ±   2.285   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                1  avgt    5  11887.215 ± 584.466  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                1  avgt    5    576.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                1  avgt    5    162.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                1  avgt    5    212.000                ms
TracesBenchmark.measureThroughput                                         1                2  avgt    5     46.943 ±   2.083   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                2  avgt    5  11702.329 ± 513.790  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                2  avgt    5    576.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                2  avgt    5    159.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                2  avgt    5    230.000                ms
TracesBenchmark.measureThroughput                                         2                0  avgt    5     48.431 ±   1.966   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                0  avgt    5  10712.378 ± 436.213  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                0  avgt    5    544.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                0  avgt    5    146.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                0  avgt    5    210.000                ms
TracesBenchmark.measureThroughput                                         2                1  avgt    5     46.974 ±   3.057   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                1  avgt    5  11046.458 ± 731.622  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                1  avgt    5    544.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                1  avgt    5    151.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                1  avgt    5    210.000                ms
TracesBenchmark.measureThroughput                                         2                2  avgt    5     50.708 ±   0.980   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                2  avgt    5  10230.724 ± 197.935  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                2  avgt    5    544.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                2  avgt    5    139.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                2  avgt    5    187.000                ms
Benchmark after the first improvement
Benchmark                                             (reactorLeadingLines)  (trailingLines)  Mode  Cnt      Score     Error   Units
TracesBenchmark.measureThroughput                                         0                0  avgt    5     21.666 ±   0.360   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                0  avgt    5  10563.716 ± 175.257  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                0  avgt    5    240.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                0  avgt    5    144.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                0  avgt    5    212.000                ms
TracesBenchmark.measureThroughput                                         0                1  avgt    5     36.057 ±   1.438   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                1  avgt    5  10791.558 ± 429.161  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                1  avgt    5    408.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                1  avgt    5    146.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                1  avgt    5    206.000                ms
TracesBenchmark.measureThroughput                                         0                2  avgt    5     35.516 ±   1.274   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                2  avgt    5  10955.809 ± 391.095  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                2  avgt    5    408.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                2  avgt    5    149.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                2  avgt    5    209.000                ms
TracesBenchmark.measureThroughput                                         1                0  avgt    5     37.723 ±   1.266   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                0  avgt    5  10719.145 ± 360.313  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                0  avgt    5    424.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                0  avgt    5    145.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                0  avgt    5    201.000                ms
TracesBenchmark.measureThroughput                                         1                1  avgt    5     35.600 ±   2.316   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                1  avgt    5  11360.236 ± 738.477  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                1  avgt    5    424.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                1  avgt    5    154.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                1  avgt    5    221.000                ms
TracesBenchmark.measureThroughput                                         1                2  avgt    5     36.167 ±   1.605   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                2  avgt    5  11180.857 ± 492.758  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                2  avgt    5    424.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                2  avgt    5    152.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                2  avgt    5    213.000                ms
TracesBenchmark.measureThroughput                                         2                0  avgt    5     34.945 ±   0.914   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                0  avgt    5  10697.778 ± 280.941  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                0  avgt    5    392.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                0  avgt    5    145.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                0  avgt    5    210.000                ms
TracesBenchmark.measureThroughput                                         2                1  avgt    5     35.215 ±   1.076   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                1  avgt    5  10615.784 ± 325.177  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                1  avgt    5    392.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                1  avgt    5    144.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                1  avgt    5    198.000                ms
TracesBenchmark.measureThroughput                                         2                2  avgt    5     35.754 ±   1.788   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                2  avgt    5  10456.676 ± 524.036  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                2  avgt    5    392.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                2  avgt    5    142.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                2  avgt    5    200.000                ms
Benchmark after the second improvement
Benchmark                                             (reactorLeadingLines)  (trailingLines)  Mode  Cnt     Score     Error   Units
TracesBenchmark.measureThroughput                                         0                0  avgt    5    16.695 ±   0.223   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                0  avgt    5  6854.558 ±  91.285  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                0  avgt    5   120.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                0  avgt    5    94.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                0  avgt    5   130.000                ms
TracesBenchmark.measureThroughput                                         0                1  avgt    5    27.538 ±   2.173   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                1  avgt    5  7482.351 ± 582.492  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                1  avgt    5   216.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                1  avgt    5   102.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                1  avgt    5   143.000                ms
TracesBenchmark.measureThroughput                                         0                2  avgt    5    26.589 ±   1.541   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                2  avgt    5  7748.273 ± 448.220  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                2  avgt    5   216.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                2  avgt    5   106.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                2  avgt    5   154.000                ms
TracesBenchmark.measureThroughput                                         1                0  avgt    5    28.482 ±   0.514   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                0  avgt    5  7500.140 ± 135.789  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                0  avgt    5   224.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                0  avgt    5   102.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                0  avgt    5   150.000                ms
TracesBenchmark.measureThroughput                                         1                1  avgt    5    26.180 ±   0.323   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                1  avgt    5  8159.420 ± 101.176  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                1  avgt    5   224.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                1  avgt    5   111.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                1  avgt    5   163.000                ms
TracesBenchmark.measureThroughput                                         1                2  avgt    5    26.194 ±   0.395   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                2  avgt    5  8155.085 ± 122.896  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                2  avgt    5   224.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                2  avgt    5   111.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                2  avgt    5   157.000                ms
TracesBenchmark.measureThroughput                                         2                0  avgt    5    26.485 ±   0.255   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                0  avgt    5  7489.392 ±  72.106  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                0  avgt    5   208.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                0  avgt    5   102.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                0  avgt    5   146.000                ms
TracesBenchmark.measureThroughput                                         2                1  avgt    5    26.530 ±   0.298   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                1  avgt    5  7476.572 ±  83.959  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                1  avgt    5   208.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                1  avgt    5   102.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                1  avgt    5   149.000                ms
TracesBenchmark.measureThroughput                                         2                2  avgt    5    26.710 ±   0.566   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                2  avgt    5  7426.339 ± 157.301  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                2  avgt    5   208.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                2  avgt    5   101.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                2  avgt    5   146.000                ms
Benchmark after the third improvement
Benchmark                                             (reactorLeadingLines)  (trailingLines)  Mode  Cnt     Score     Error   Units
TracesBenchmark.measureThroughput                                         0                0  avgt    5    18.096 ±   0.304   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                0  avgt    5  7588.502 ± 127.338  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                0  avgt    5   144.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                0  avgt    5   103.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                0  avgt    5   145.000                ms
TracesBenchmark.measureThroughput                                         0                1  avgt    5    22.850 ±   0.378   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                1  avgt    5  5008.182 ±  83.316  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                1  avgt    5   120.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                1  avgt    5    68.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                1  avgt    5    98.000                ms
TracesBenchmark.measureThroughput                                         0                2  avgt    5    20.251 ±   0.517   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           0                2  avgt    5  5651.105 ± 144.835  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      0                2  avgt    5   120.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                0                2  avgt    5    77.000            counts
TracesBenchmark.measureThroughput:gc.time                                 0                2  avgt    5   114.000                ms
TracesBenchmark.measureThroughput                                         1                0  avgt    5    23.276 ±   0.700   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                0  avgt    5  4261.171 ± 128.182  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                0  avgt    5   104.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                0  avgt    5    58.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                0  avgt    5    79.000                ms
TracesBenchmark.measureThroughput                                         1                1  avgt    5    19.319 ±   0.161   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                1  avgt    5  5133.773 ±  43.098  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                1  avgt    5   104.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                1  avgt    5    70.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                1  avgt    5    96.000                ms
TracesBenchmark.measureThroughput                                         1                2  avgt    5    19.673 ±   0.342   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           1                2  avgt    5  5041.454 ±  87.186  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      1                2  avgt    5   104.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                1                2  avgt    5    68.000            counts
TracesBenchmark.measureThroughput:gc.time                                 1                2  avgt    5    89.000                ms
TracesBenchmark.measureThroughput                                         2                0  avgt    5    18.855 ±   0.652   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                0  avgt    5  5260.401 ± 184.048  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                0  avgt    5   104.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                0  avgt    5    72.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                0  avgt    5    99.000                ms
TracesBenchmark.measureThroughput                                         2                1  avgt    5    18.931 ±   0.206   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                1  avgt    5  5238.827 ±  56.895  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                1  avgt    5   104.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                1  avgt    5    71.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                1  avgt    5   103.000                ms
TracesBenchmark.measureThroughput                                         2                2  avgt    5    18.723 ±   0.983   ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate                           2                2  avgt    5  5297.753 ± 279.570  MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm                      2                2  avgt    5   104.000 ±   0.001    B/op
TracesBenchmark.measureThroughput:gc.count                                2                2  avgt    5    72.000            counts
TracesBenchmark.measureThroughput:gc.time                                 2                2  avgt    5   100.000                ms

So for the common (1, 1) case, we see the following timing and memory usage differences:

Variant       Speed                           Normalized garbage
============= ============================    ==========================
Baseline      145.273 ± 0.361 ns/op           1232.000 ± 0.001 B/op
Reviewed code  46.214 ± 2.285 ns/op (-68%)     576.000 ± 0.001 B/op (-53%)
Speedup 1      35.600 ± 2.316 ns/op (-23%)     424.000 ± 0.001 B/op (-26%)
Speedup 2      26.180 ± 0.323 ns/op (-26%)     224.000 ± 0.001 B/op (-47%)
Speedup 3      19.319 ± 0.161 ns/op (-26%)     104.000 ± 0.001 B/op (-53%)

I can see how "Speedup 3" is controversial from a maintainability point of view. I guess the only way to justify it, is by realizing that we're dealing with a very hot codepath here (at least for Reactor Debug Agent users). Up to you :)

@Stephan202
Copy link
Contributor Author

NB: One further speed-up could be to avoid the string trimming altogether: IIUC the whitespace is introduced only in these places:

  1. CallSiteSupplierFactory:103-106
  2. CallSiteSupplierFactory:149-152
  3. CallSiteSupplierFactory:83 (Java 11)
  4. CallSiteSupplierFactory:103-111 (Java 11)
  5. CallSiteInfoAddingMethodVisitor:L120

That last one is part of the Reactor Debug Agent, shipped as a separate JAR. So the question is whether we can stop trimming before the next major release. (Unless users are required to use reactor-core and reactor-tools versions that match down to the patch level, but given that the Java agent may be configured outside of the application in which reactor-core is bundled, that would seem like a rather strict requirement.) But we could already stop creating the whitespace.

Trimming of trailing whitespace can already be dropped (it's never introduced), though that would require updating the unit tests.

Happy to do in this or another PR; just let me know.

}

boolean startsWith(String prefix) {
return str.startsWith(prefix, start);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our specific use case it probably won't be a problem but in general this is incomplete as the end index is not considered. Perhaps a simple check for start + prefix.length() < end is also required to make it correct?
With that, I'd argue a bunch of unit tests for this inner class would be helpful. It can be made package private and we can test it in TracesTest.java.

}

// XXX: Explain.
private static final class Substring {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider renaming it to StringView. This would communicate to the reader that we are only wrapping the underlying String and not making any copies of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even StackLineView, since it exposes methods specific to the lines in the stack trace.


// XXX: Explain.
private static final class Substring {
private final String str;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider renaming str to underlying, actual, or backingString to indicate its nature.

@chemicL
Copy link
Member

chemicL commented Dec 2, 2024

@Stephan202 the numbers you show look excellent. I think it's worth integrating the current proposal, since in the (1, 1) case you get almost ~7.5x speedup and 12x less memory pressure!

Regarding avoiding the trimming - I think we can stop here. With the above result, changing the behaviour is not justified I believe. I suspect the tabs are useful in some form when printing the stack trace and are only trimmed when finding the "trace-back", right? Anyways, if you want to spend more time on it and show we're not breaking anything, we can discuss that in another issue/PR and aim for closing this one so we can release it soon :)

Please add a unit test, potentially apply some refactoring suggestions and add a comment in place of // XXX that you left and we can ship this. I'm excited to hear about your production savings with this change :)

Copy link
Member

@chemicL chemicL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge and follow up applying the most recent feedback. I'd like for this optimization to be part of the upcoming release.
Thanks a lot @Stephan202, it has been a true pleasure to work with your contributions. Please do come back with more ideas!

@chemicL chemicL merged commit 0a87988 into reactor:main Dec 4, 2024
7 checks passed
chemicL added a commit that referenced this pull request Dec 4, 2024
@chemicL chemicL added the type/enhancement A general enhancement label Dec 4, 2024
@chemicL chemicL added this to the 3.7.1 milestone Dec 4, 2024
@Stephan202 Stephan202 deleted the sschroevers/traces-performance-improvement branch December 4, 2024 12:59
@Stephan202
Copy link
Contributor Author

Thanks for the thoughtful code review and thanks for filing #3949! I too am curious to see how this fares in production. If I have details to share, I'll report back here :)

(As for avoiding the trimming: if I have spare cycles in the future I don't mind having a closer look at that; TBD!)

@chemicL chemicL changed the title Optimize Scannable#name() and related logic Optimize observability and debugging experience Dec 4, 2024
chemicL added a commit that referenced this pull request Dec 4, 2024
Follow-up to #3901:
* Renamed `Substring` to `StackLineView`
* Implemented tests for `StackLineView`
* Corrected `contains` and `startsWith` implementations
@chemicL
Copy link
Member

chemicL commented Dec 18, 2024

@Stephan202 I was curious if you had a chance to deploy the newest version?

@Stephan202
Copy link
Contributor Author

Hey @chemicL! Sorry for not following up. A colleague did test this change in isolation in production, but unfortunately no clear impact was measured. TBH, slightly surprising and disappointing. We're using Datadog in production, but due to Reactor's deep call stacks, DD's JFR-based profiling functionality can't correlate CPU and memory usage with some of our hottest reactive code. This makes it hard to do a more fine-grained before- and after comparison. (We currently run with -XX:FlightRecorderOptions=stackdepth=512, but last I tested this, even the maximum value of -XX:FlightRecorderOptions=stackdepth=2048 didn't change this.)

One caveat is that I didn't find time to do a deeper analysis myself, and now with Christmas coming up there's no appetite to run more experiments (e.g. by doing a temporary downgrade) in production 😬.

@chemicL
Copy link
Member

chemicL commented Jan 13, 2025

@Stephan202 thanks for the heads up. I also took a longer break now, hence the delayed response :-) Sorry to hear there was no clear improvement. Perhaps Amdahl's Law strikes again. Anyways, thanks for the effort. I am certain some workloads will see an improvement from this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/observability area/performance This belongs to the performance theme type/enhancement A general enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants