Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce String/StringBuilder allocations #984

Merged
merged 5 commits into from
Aug 30, 2023
Merged

Conversation

franz1981
Copy link
Contributor

This is addressing quarkusio/quarkus#35406 is the most brutal way: reducing immutability and ensuring to reduce hot-paths useless allocations as much as possible

@franz1981
Copy link
Contributor Author

Speedwise for checking contains of char sequence in batch a nice/better approach can be https://github.com/facebook/folly/blob/main/folly/String.cpp#L523-L613 which I've used on Netty as well, and works pretty well.

@franz1981
Copy link
Contributor Author

@radcortez fingers crossed it didn't break anything

@franz1981
Copy link
Contributor Author

franz1981 commented Aug 27, 2023

And indeed It seems I have to fix something I have done to work with

Which means that I should check the case where the name is one char bigger than sb too, but sb last char should be "" and the last 2 chars of property name must be _
In such case I can just keep the same comparison loop but stopping at sb.length - 1.

The ignore case, instead, should be verified against Turkish, but tbh it should be always Latin, so...no problems.

@franz1981
Copy link
Contributor Author

franz1981 commented Aug 27, 2023

ok, I've fixed the case of #984 (comment), but now the concern is related the speed of equalsIgnoreCaseReplacingNonAlphanumericByUnderscore which is performing
many comparisons to check if a letter is an Ascii letter or a digit.

This could harm performance in this case

image

where, in order to avoid creating a new String, we skip performing replaceNonAlphanumericByUnderscores just once: it means that we would performing several times the checks to replace non alphanumeric character.

One common fact is to have similar prefixes and lengths eg MY_APP_REST_CONFIG_MY_CLIENT_KEYSTORE_PASSWORDvsmy-app.rest-config.my-client.endpoints[*].pathwhich would fail the comparison after29` comparisons!

Anyway, running the case of quarkusio/quarkus#35406 will help reveal if startup performance is impacted, but IMO is still worthy writing a microbenchmark to improve it.

In the worst case, just for the case mentioned above, we can just revert and sanitize the property just once, in order to speedup the subsequent equalsIgnoreCase checks.

@@ -67,14 +68,16 @@ final class ConfigMappingProvider implements Serializable {
this.validateUnknown = builder.validateUnknown;

final ArrayDeque<String> currentPath = new ArrayDeque<>();
final NameIterator nameIterator = NameIterator.empty();
for (Map.Entry<String, List<Class<?>>> entry : roots.entrySet()) {
Copy link
Contributor

@geoand geoand Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@franz1981 IIRC, this kind of iteration is not great. Would using Maps.forEach be better or does this map contain so few entries that it doesn't really matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't focus on the collections here, tbh, but will take a look! Usually forEach is the best but when the code is hot, and not cold (at interpreted/C1) compilation level... lambda metafactory is terribly slow if the compilation Is not C2!
Indeed for interpreted/C1 compilation usually worth avoid lambdas and operations, in general: the less is better (which goes a bit against what I have said at #984 (comment) sigh!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about if MethodHandle was being used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmm same thing likely, these methods usually have some not negligible bootstrap cost...which is the reason why in cold cases, I prefer Atomic updaters vs VarHandle (which requires method handles)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not easy but if you are curious (I am), I can distill some microbench on it, wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not if you have better things to do :)

Copy link
Contributor Author

@franz1981 franz1981 Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always have, but this one could be a good direction to improve out start-up time; it was some time I wanted to write some "do/don't" guide to help devs to write good optimized code for startup, which sadly need to have slightly different style in the "core" parts we care

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌🏼

@@ -88,14 +88,62 @@ public static String[] split(String text) {
return list.toArray(NO_STRINGS);
}

private static boolean isAsciiLetterOrDigit(char c) {
return 'a' <= c && c <= 'z' ||
'A' <= c && c <= 'Z' ||

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'A' <= c && c <= 'Z' ||
isAsciiUpperCase(c) ||

Or would a method call here make things too slow? Though, it might get inlined, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in favour of reusing what we got; in theory the size (in term of bytecode size) of both methods is so small that will likely being inlined, if hot, unless we are unlucky and the depth of the call fall on the max inlining depth (which is 15 from JDK>=11 IIRC).
In theory these methods will be called for each character of each property names, which means they will likely (to be verified, really!) C2 compiled, and inlining, as intrinsics, won't kick in with previous compilation levels AFAK, so, If lucky, everything will work as expected.
To play safe, we should manually inline everything if this is supposed to happen at startup and we won't expect C2 compilation to kick-in.

TBH in a later iteration of this code, due to #984 (comment) I will likely turn this code into some branchless variant, which can work on 4 chars at time, which will give 3 important benefits:

  • most comparisons are on common prefixes, hence batching will grant to speedup moving to the different part of strings to compare
  • most names have a mix of alphabetical and not chars, very unpredictable; a branchless approach will pay less in such cases
  • the batchy and branchless variant will amortize the cost of unlucky inlining decisions or lower compilation level than C2 by performing more ops in a row

@radcortez
Copy link
Member

Thanks for the PR. Let me do a few runs to check the baseline, and I think I have some other ideas to improve.

@franz1981
Copy link
Contributor Author

franz1981 commented Aug 28, 2023

Yep thanks @radcortez sorry I am with an old laptop here (from my wife) and couldn't try it (it's using windows too and if install other dev stuff she won't be super happy, let's say :P )...
As said earlier, it contains some clear tradeoff (that can be addressed on my back), but if that work, some of the utils could be used in other places.
Right now I have limited them in the method we know was affected by the higher allocation rate, to simplify the analysis.

If the troubleshooting method I have reported in the original issue work as expected, it should be reliable indipedently from the hardware and allow an easy match with the flamegraphs reported by John, but to get proper estimation of performance loss, in term of throughout, requires some more analysis, especially if the startup cost happen with low resources (in term of CPU available) containers: in such cases, the lack of CPU cycles often make the last tier compilation to kick in very late, and having efficient code at lower compilation tier requires a different "code style" let's say, similar to what early Android developers (in the Davilkit days) were used to, for comparison.

@radcortez
Copy link
Member

Aside from the allocation issues, we are making unnecessary comparisons. I've just added a filter to the properties we need to match (only look at properties in the namespaces being mapped and not every env property), and this drastically cuts all the iterations and allocations.

Of course, your code will help when we need to do the matching, so I think the combination of both should be good.

I have some other ideas on how to improve this, but it will take a considerable amount of time.

@@ -7,7 +7,7 @@
/**
* An iterator for property name strings.
*/
public final class NameIterator {
public final class NameIterator implements AutoCloseable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like the stylistic change here of making the NI be closeable; does the lifespan of the NI really leak out enough that the string needs to be cleared?

Copy link
Contributor Author

@franz1981 franz1981 Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I made it closeable is to leverage the try with resource and reset the underlying state, to not mess up with subsequent reuse...but any reset would work the same to me...just sad that I couldn't leave it immutable

The lifespan now should be clear in order to allow reusing it (and its tmp builder, which was one of the major reason of heavy allocations)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, the change I did to make getNextSegment able to use an external builder, could be enough to make it immutable again...but I have the strong feeling that other core path cannot leverage the same new API and would benefit from reusingnthe underlying builder...I am open to any other idea to improve it/make it right.

Pooling usually suck, API wise, due to introduced mutability

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmlloyd are you ok with this?

}

// Ignore Env properties that don't belong to a root
Set<String> envPropertiesUnmapped = new HashSet<>();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a set here? Or we can just an array list?
We already iterated from a set of env properties, so why using a set here again? There shouldn't be any duplicate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this can be a list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, its because we remove the unmapped ones from the main set

Copy link
Contributor Author

@franz1981 franz1981 Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Set.html#removeAll(java.util.Collection) requires just a collection

And thinking about it twice, wouldn't be nice to use the original set and avoid creating a bigger one just removing the elements later?
We could process directly the original set and add it to the new one if it doesn't contains names which match the unmapped one, saving an additional allocation

@radcortez
Copy link
Member

I've added some additional improvements.

@radcortez radcortez changed the title Reduce String/StringBuilder allocations on ConfigMappings::mapConfiguration Reduce String/StringBuilder allocations Aug 30, 2023
@radcortez radcortez merged commit ae29c40 into smallrye:main Aug 30, 2023
6 checks passed
@github-actions github-actions bot added this to the 3.3.4 milestone Aug 30, 2023
radcortez added a commit that referenced this pull request Aug 30, 2023
* Reduce String/StringBuilder allocations on ConfigMappings::mapConfiguration

* Filter environment variables to be mapped with mappings roots

* Cache property names during mapping

* Avoid exception allocation when checking for indexed segment

---------

Co-authored-by: Roberto Cortez <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants