-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce String/StringBuilder allocations #984
Conversation
Speedwise for checking contains of char sequence in batch a nice/better approach can be https://github.com/facebook/folly/blob/main/folly/String.cpp#L523-L613 which I've used on Netty as well, and works pretty well. |
@radcortez fingers crossed it didn't break anything |
And indeed It seems I have to fix something I have done to work with smallrye-config/common/src/main/java/io/smallrye/config/common/utils/StringUtil.java Line 102 in 6bd3669
Which means that I should check the case where the name is one char bigger than sb too, but sb last char should be "" and the last 2 chars of property name must be _ The ignore case, instead, should be verified against Turkish, but tbh it should be always Latin, so...no problems. |
82c5bfa
to
5f5367f
Compare
ok, I've fixed the case of #984 (comment), but now the concern is related the speed of This could harm performance in this case where, in order to avoid creating a new One common fact is to have similar prefixes and lengths eg MY_APP_REST_CONFIG_MY_CLIENT_KEYSTORE_PASSWORD Anyway, running the case of quarkusio/quarkus#35406 will help reveal if startup performance is impacted, but IMO is still worthy writing a microbenchmark to improve it. In the worst case, just for the case mentioned above, we can just revert and sanitize the property just once, in order to speedup the subsequent equalsIgnoreCase checks. |
@@ -67,14 +68,16 @@ final class ConfigMappingProvider implements Serializable { | |||
this.validateUnknown = builder.validateUnknown; | |||
|
|||
final ArrayDeque<String> currentPath = new ArrayDeque<>(); | |||
final NameIterator nameIterator = NameIterator.empty(); | |||
for (Map.Entry<String, List<Class<?>>> entry : roots.entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@franz1981 IIRC, this kind of iteration is not great. Would using Maps.forEach
be better or does this map contain so few entries that it doesn't really matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't focus on the collections here, tbh, but will take a look! Usually forEach is the best but when the code is hot, and not cold (at interpreted/C1) compilation level... lambda metafactory is terribly slow if the compilation Is not C2!
Indeed for interpreted/C1 compilation usually worth avoid lambdas and operations, in general: the less is better (which goes a bit against what I have said at #984 (comment) sigh!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about if MethodHandle was being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmmm same thing likely, these methods usually have some not negligible bootstrap cost...which is the reason why in cold cases, I prefer Atomic updaters vs VarHandle (which requires method handles)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not easy but if you are curious (I am), I can distill some microbench on it, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not if you have better things to do :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We always have, but this one could be a good direction to improve out start-up time; it was some time I wanted to write some "do/don't" guide to help devs to write good optimized code for startup, which sadly need to have slightly different style in the "core" parts we care
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌🏼
@@ -88,14 +88,62 @@ public static String[] split(String text) { | |||
return list.toArray(NO_STRINGS); | |||
} | |||
|
|||
private static boolean isAsciiLetterOrDigit(char c) { | |||
return 'a' <= c && c <= 'z' || | |||
'A' <= c && c <= 'Z' || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'A' <= c && c <= 'Z' || | |
isAsciiUpperCase(c) || |
Or would a method call here make things too slow? Though, it might get inlined, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am in favour of reusing what we got; in theory the size (in term of bytecode size) of both methods is so small that will likely being inlined, if hot, unless we are unlucky and the depth of the call fall on the max inlining depth (which is 15 from JDK>=11 IIRC).
In theory these methods will be called for each character of each property names, which means they will likely (to be verified, really!) C2 compiled, and inlining, as intrinsics, won't kick in with previous compilation levels AFAK, so, If lucky, everything will work as expected.
To play safe, we should manually inline everything if this is supposed to happen at startup and we won't expect C2 compilation to kick-in.
TBH in a later iteration of this code, due to #984 (comment) I will likely turn this code into some branchless variant, which can work on 4 chars at time, which will give 3 important benefits:
- most comparisons are on common prefixes, hence batching will grant to speedup moving to the different part of strings to compare
- most names have a mix of alphabetical and not chars, very unpredictable; a branchless approach will pay less in such cases
- the batchy and branchless variant will amortize the cost of unlucky inlining decisions or lower compilation level than C2 by performing more ops in a row
Thanks for the PR. Let me do a few runs to check the baseline, and I think I have some other ideas to improve. |
Yep thanks @radcortez sorry I am with an old laptop here (from my wife) and couldn't try it (it's using windows too and if install other dev stuff she won't be super happy, let's say :P )... If the troubleshooting method I have reported in the original issue work as expected, it should be reliable indipedently from the hardware and allow an easy match with the flamegraphs reported by John, but to get proper estimation of performance loss, in term of throughout, requires some more analysis, especially if the startup cost happen with low resources (in term of CPU available) containers: in such cases, the lack of CPU cycles often make the last tier compilation to kick in very late, and having efficient code at lower compilation tier requires a different "code style" let's say, similar to what early Android developers (in the Davilkit days) were used to, for comparison. |
Aside from the allocation issues, we are making unnecessary comparisons. I've just added a filter to the properties we need to match (only look at properties in the namespaces being mapped and not every env property), and this drastically cuts all the iterations and allocations. Of course, your code will help when we need to do the matching, so I think the combination of both should be good. I have some other ideas on how to improve this, but it will take a considerable amount of time. |
@@ -7,7 +7,7 @@ | |||
/** | |||
* An iterator for property name strings. | |||
*/ | |||
public final class NameIterator { | |||
public final class NameIterator implements AutoCloseable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like the stylistic change here of making the NI be closeable; does the lifespan of the NI really leak out enough that the string needs to be cleared?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I made it closeable is to leverage the try with resource and reset the underlying state, to not mess up with subsequent reuse...but any reset would work the same to me...just sad that I couldn't leave it immutable
The lifespan now should be clear in order to allow reusing it (and its tmp builder, which was one of the major reason of heavy allocations)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, the change I did to make getNextSegment able to use an external builder, could be enough to make it immutable again...but I have the strong feeling that other core path cannot leverage the same new API and would benefit from reusingnthe underlying builder...I am open to any other idea to improve it/make it right.
Pooling usually suck, API wise, due to introduced mutability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmlloyd are you ok with this?
implementation/src/main/java/io/smallrye/config/ConfigMappingProvider.java
Show resolved
Hide resolved
implementation/src/main/java/io/smallrye/config/ConfigMappingProvider.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
// Ignore Env properties that don't belong to a root | ||
Set<String> envPropertiesUnmapped = new HashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a set here? Or we can just an array list?
We already iterated from a set of env properties, so why using a set here again? There shouldn't be any duplicate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this can be a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, its because we remove the unmapped ones from the main set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Set.html#removeAll(java.util.Collection) requires just a collection
And thinking about it twice, wouldn't be nice to use the original set and avoid creating a bigger one just removing the elements later?
We could process directly the original set and add it to the new one if it doesn't contains names which match the unmapped one, saving an additional allocation
I've added some additional improvements. |
common/src/main/java/io/smallrye/config/common/utils/StringUtil.java
Outdated
Show resolved
Hide resolved
common/src/main/java/io/smallrye/config/common/utils/StringUtil.java
Outdated
Show resolved
Hide resolved
* Reduce String/StringBuilder allocations on ConfigMappings::mapConfiguration * Filter environment variables to be mapped with mappings roots * Cache property names during mapping * Avoid exception allocation when checking for indexed segment --------- Co-authored-by: Roberto Cortez <[email protected]>
This is addressing quarkusio/quarkus#35406 is the most brutal way: reducing immutability and ensuring to reduce hot-paths useless allocations as much as possible