-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Startup RSS regression introduced by #35246 #35406
Comments
@johnaohara were you able to get memory dumps of before/after? |
@gsmet not yet, have only reported the change that was detected by Horreum. I will get some more detail now |
Given we are moving most of the config to this new infrastructure, it's indeed important if it introduces a regression. I'm surprised our previous work on this didn't show though as we moved quite some config already. |
I have compared this commit (3875d03) with the previous commit (6f55d65) Heap dumps show the live set size is approximately the same after startup. In JVM mode without Xmx set There are ~13.4MB more unreachable Objects for 3875d03 compared to 6f55d65 (prev commit), 3875d03 - Unreachable Objects: 40.10MB ; No. GC’s: 2 If you limit heap size (i.e. -Xmx64m) the RSS size is comparable after startup is comparable Looking at JFR data, total allocation during startup; 3875d03 - 73.887MB In Native mode with -Xmx64m Number of GC’s changes from 3 -> 6 More objects are allocated during startup, so limited heap slows application startup: for config-quickstart (-Xmx64m) the startup times reported are; 3875d03 - 0.023s Heap recovered during startup: 3875d03 - 47.104MB; No. GC’s: 6 Quarkus is now allocating more during startup. If you limit the heap size, the RSS will likely stay fairly constant, however startup times will be affected as there is more GC activity. Conversely, if Xmx is not set, the startup times remain consistent, but there is an increase in RSS I have attached some allocation profiles with async-profiler in JVM mode. I can look at them in more detail next week |
I really really think that we should be looking into what @gsmet and I have proposed in the past - that |
From what I can see, |
But it's not the only culprit AFAICS: the clinit of Also the Netty/Vert.x startup allocation profile looks a bit different, I'm not sure if we don't have a regression there too. |
@radcortez I think we will really need you to have a look at the config issues. |
@johnaohara btw, we are still very interested in your insights and what we can do to fix theses issues (and if there are others). I just did a very quick analysis. |
I'm out until next week. Please revert the commit if this is blocking the release until I can look into it. Thanks... and sorry. |
@radcortez yeah, no worries. I assigned two issues to you so that you can find them easily when you're back. Have fun! |
|
Hi all, I've already spoken about it with @johnaohara on gchat, but let me put here some hints to help troubleshooting this... TLAB profiling is not a good idea with so few samples, and risk to be very inaccurate. When we allocate so few, we are more interested in the full spectrum of data, unbiased possibly, even at the cost to not distinguish between live/temp allocations and tenured ones (maybe still temporary but with an extended lifetime, due to the heap capacity). What to do then? For startup (more accurate) measurements I suggest 2 experiments. First, using EpsilonGC:
The second experiment requires a bit more involvement and is meant to detect which of the allocations are alive after startup, contributing to the overall footprint.
The second experiment is very similar to collect an heap dump, but will bring the stacktrace along with the allocations, helping to spot where it has happened. Hope this can help (I'll be on PTO from today eod, and still clearing up my backlog) VERY IMPORTANT NOTE An easy way to find, for the allocated types, what has changed (the magnitude), is to let the converter to produce reverse flamegraphs, by adding |
One note related to the live option: https://github.com/async-profiler/async-profiler/blob/dcc3ffd083a64d5a1848e79c1bded141295b6e0a/src/objectSampler.cpp#L45 shows that async profiler retain by default 1024 different waek refs to detect leaks, meaning that if the allocations coming from the startup, alive, exceed that capacity, won't be reported. I am currently not aware of any other mechanism which can collect such leaks (apart from jfr oldObject event) with the stack trace. |
@geoand 3.3 might be affected as we started to convert several key areas to config mapping. It might be related (or not, to be verified) to Matt Raible's report that 3.3 is significantly slower. |
@franz1981 given it's a major regression, I think it would be very helpful if you could help to pinpoint the problem we have. |
Using the methodology @franz1981 described in method 1 above;
and
there is a large increase in allocations coming from |
At a quick glance, it seems that some of the allocation issues are related to |
I don't think it would be any different, for this particular case. All generation happens at build time, this is the mapping part. There are probably some pieces from the mapping side that could be moved to build time. |
Cool. I'd be glad to help if you need assistance |
@johnaohara can you please attach the allocation files? Thanks! |
@johnaohara if you attach the jfr the people can use it to extract the lines too |
@radcortez FYI the async profiler output is available in this comment: #35406 (comment) |
@gsmet it wasn't using the mode I have explained later, so it is slightly less accurate, but I didn't verified how much |
Thanks. Sorry, I didn't notice they were attached earlier. I was only looking at the latest screenshots. |
Drilling down on the graphs, a lot of the allocations are performed by https://github.com/smallrye/smallrye-config/blob/30be0d1def67783b5bdc977d4aad2ce5b82dc186/implementation/src/main/java/io/smallrye/config/ConfigMappingProvider.java#L1053 method. This method tried to do some matching between environment variables and the mapped properties. Since we now have more mappings, and if there is an issue with this method, it makes sense that we didn't notice it until now since we didn't have that many before. Also, many things run twice because of the static init config and the runtime init config, which are created separately. For some time, I wanted to reuse some of the static config stuff to feed in the runtime config and reduce the allocations. I guess we need to do that now. |
Looking it quickly I see many hanging fruits there regardless...like, many of the String and builders allocated are not necessary at all... |
Right, this is how things are meant to work :) |
Correct. I'll start the work right away when I get back. Meanwhile, feel free to revert the original commit to avoid affecting the performance further. We now have a baseline and we can work and improve on it. Thanks! |
That's a fair point indeed @geoand , or we would end up writing ugly code for every single line of code! |
The zipped flame graphs in #35406 (comment) were allocation samples. We updated the methodology to capture the allocation sizes and give a more accurate picture on what is causing the extra memory pressure, this methodology produced the screenshots in this #35406 (comment) Attached are the 2nd allocation size profiles |
I think we can live with this in 3.3 (especially as 3.2 is not affected) and have a better 3.4 |
@radcortez let me know if the method @johnaohara used which I have explained in the comment works for you... |
Sure. Thanks! |
@radcortez I've sent something at smallrye/smallrye-config#984 |
To add more information: The issue is caused by special rules to match environment variables and regular properties. The matching is applied to all available environment variables with all properties available in mappings. We hardly noticed this before, because we only had a few extensions migrated, and with the increase in mappings, the issue became more visible. @franz1981 PR in smallrye/smallrye-config#984 addresses the allocation issues, which should help when we need to match stuff, but adding a filter to the environment variables to consider drastically reduces the number of matching we need to perform (and allocations). Currently, I'm seeing the following numbers in my box:
The PR seems to return the number to the previously expected RSS. Maybe with a very slight increase, but I'll keep looking. |
Describe the bug
PR: #35246 has introduced a startup RSS regression of ~14-17% (depending on application and environment) in both JVM and Native modes.
It is possible to see the effects in a simple quickstart (getting-started or config-quickstart)
config-quickstart Startup RSS
Expected behavior
No response
Actual behavior
No response
How to Reproduce?
Steps to reproduce:
Output of
uname -a
orver
No response
Output of
java -version
No response
GraalVM version (if different from Java)
No response
Quarkus version or git rev
No response
Build tool (ie. output of
mvnw --version
orgradlew --version
)No response
Additional information
Tests were performed in docker with 16 cores, however, the increase is also measurable with much less cores, down to 2 vCPU's
The text was updated successfully, but these errors were encountered: