-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak #185
Comments
One thing to note, Java generally uses all available memory before initiating garbage collection. This behavior can sometimes be interpreted as a leak, though it is just how the JVM behaves. Does memory usage reach a peak and then drop? If your server has no queries for a period does memory usage drop to a low baseline? Does the server crash with an out of memory error? Can you provide additional information about your setup? For example what datasets (or at least dataset types) are you using? Are there queries or feature usage that seem to trigger your issues? Does it seem to happen if the server is under high load? Do you have a lot of datasets that refresh and are you using the new sharedWatchService? Can you upgrade the software? ERDDAP's latest version is 2.24. I'd also recommend using the most recent LTS Java version (21), and ensuring the Tomcat version is up to date. |
Currently docker-erddap doesn't have an image for ERDDAP 2.24. I can work on publishing that with Java 21 and updated Tomcat to facilitate testing here. Will ping when that's ready. |
@srstsavage In looking up how to set up your Docker it shows: docker run ERDDAP_MIN_MEMORY, ERDDAP_MAX_MEMORY are not default settings in Tomcat, so it must be something you do internally in the docker image. What do these map do inside the image, and generally what does the setenv.sh look like, or any other scripts controlling the startup of Tomcat and Java (things like which garbage collector is being used, any restrictions on metaspace use etc etc). It may just be a coincidence, but most of the reports of this have been in sites running the Docker image. We have had a related problem but it has to do with swapping and running out of swap space, mainly because we need more memory - the Java memory model introduced in Java 16 changes a lot of how things are to be set. One of the key differences is that a lot of things that use to be in heap are not put there any more, so that the total java memory usage is much larger than max heap size (on our heavy used system heap is set at 10GB, rarely is more than 5GB-7GB used, but total Java memory usage stays around 21GB - so just monitoring heap size is't sufficient. ) Basically you don't want any swapping (a long story). Also if memory serves in this particular instance -Xms, -Xmx are set to different values. For a server type setting it is much preferred that these be the same values, you can duckduckgo this and see discussions of this. |
Docker image for ERDDAP 2.24 is available at @rmendels At your suggestion I updated the README to favor
Do we have any concept of how many users are using Docker vs manually setting up a Tomcat instance? I have no clue, but I've only heard of Docker deployments for many years now (other than the CoastWatch mothership of course). |
@srstsavage @benjwadams Great with the image! Thanks. This might require some work, but would help us if once you get the particular ERDDAP instance mentioned above, if you could monitor the usage of the following:
A detailed time series isn't as important as likely maximum values of each and some idea of how much they fluctuate. Particularly for total memory use, you need to have the ERDDAP completely loaded and running for a bit to get a feel for the total java memory and number of threads One more thing that would be useful is to get an idea on what type of requests are being made during peak usage. So for instance is the heaviest use when NDBC accesses the data what are the URLs given. These type of things may help as narrow down if there is leak where it may be, or if there are Java settings that can alleviate the problems. |
We're still working on instrumentation of this issue, but I did put an .hprof dump into the Eclipse Memory Analyzer Tool from an ERDDAP that had been running for four days or so: Memory usage currently reported by
|
Erddap.java holds a number of concurrent hashmaps which have data loaded into them shortly after the server starts up (and will hold it as long as the server is running- some profilers might flag this behavior as a potential leak because it will be very old). There are a number of ConcurrentHashMaps in Erddap.java, knowing which one(s) are large for you and if they are growing (particularly, growing after the initial load datasets) would be extremely helpful since I have not reproduced the problem you are having. |
Having some serious ERDDAP issues today where I/O is getting bogged down and server load average is extremely high. Got an interesting stack trace in the logs along with log message that explicitly mentions possible memory leak conditions, however:
|
@benjwadams Based on that stack trace it looks like there may be expensive queries against netcdf based dataset(s). Can you check the ERDDAP logs for what the expensive queries might be? If there's a specific dataset that is causing a lot of expensive reads, I could try to look into what might be slow, if you share the dataset (and some sample files). |
I've seen a couple, but I'm not sure if it's due to load averages being high in the first place. |
Highly abridged output:
Pertinent
|
Any chance you are getting hit by a scan. One of the one's we get can give 20-40 requests per second for most of the day. Depending on your settings, that can chew up emory like crazy (outside of the heap), leading to swapping, at which point things do bog down. Because the way Java memory works when we see scans going on, we have heap size at 14GB but we often see 32GB allocated to java when using a command like pidstat and heap space is not totally utilized). The moral seems to be that a site with a large amount of data and heavy usage needs a lot of RAM.
-Roy
On Nov 7, 2024, at 11:17 AM, Benjamin Adams ***@***.***> wrote:
Highly abridged output:
grep '#2062' log.txt{.previous,}
log.txt.previous:{{{{#2062 2024-11-07T18:57:39+00:00 [https] (unknownIPAddress) GET /erddap/tabledap/ce_312-20200705T1232.json?longitude,latitude,qartod_location_test_flag,time&orderBy(%22time%22)
log.txt:#2062 Error: ClientAbortException
log.txt:#2062 FAILURE. TIME=342080ms (>10s!)
log.txt:*** sendError for request #2062 caught ERROR=ClientAbortException
Pertinent datasets.xml entry excerpt:
<!-- defaultDataQuery uses datasetID -->
<!--
<defaultDataQuery>&trajectory=ce_312-20200705T1232</defaultDataQuery>
<defaultGraphQuery>longitude,latitude,time&.draw=markers&.marker=2|5&.color=0xFFFFFF&.colorBar=|||||</defaultGraphQuery>
-->
<reloadEveryNMinutes>720</reloadEveryNMinutes>
<updateEveryNMillis>-1</updateEveryNMillis>
<!-- use datasetID as the directory name -->
<fileDir>/data/data/priv_erddap/OOI-CE/ce_312-20200705T1232</fileDir>
<recursive>false</recursive>
<fileNameRegex>.*\.nc</fileNameRegex>
<metadataFrom>last</metadataFrom>
<sortedColumnSourceName>time</sortedColumnSourceName>
<sortFilesBySourceNames>trajectory time</sortFilesBySourceNames>
<fileTableInMemory>false</fileTableInMemory>
<accessibleViaFiles>true</accessibleViaFiles>
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.
**********************
"The contents of this message do not reflect any position of the U.S. Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new street address***
110 McAllister Way
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: ***@***.*** www: https://www.pfeg.noaa.gov/
"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected"
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.
|
Haven't been seeing extreme requests from external sources. I think it might be the subset variables. I tried without This might be something left field of the memory problems in this GitHub issue, but it does seem to eat up quite a but of I/O. Perhaps it's worth opening a separate issue for this particular behavior. Of course, spawning a bunch of ERDDAP threads will chew up memory and CPU too while it's waiting for results. |
@benjwadams Please do make a separate issue about the subset variables. It would be helpful if you could provide a full dataset definition, data files, and example queries on that dataset that demonstrate the issue. |
Describe the bug
Memory leak under ERDDAP
To Reproduce
Steps to reproduce the behavior:
Uncertain, but this behavior occurs on some production ERDDAP servers with many aggregations.
Expected behavior
Reasonable memory usage that doesn't grow without
Screenshots
If applicable, add screenshots to help explain your problem.
Server
ERDDAP 2.23, based off Axiom's Docker-ERDDAP image
Desktop (please complete the following information):
Additional context
Memory grows without bound in certain ERDDAP configurations, eventually exhausting memory.
The text was updated successfully, but these errors were encountered: