Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

How to finetune OG VM size? #3765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ChristopheBordieu opened this issue Nov 18, 2021 · 5 comments
Closed

How to finetune OG VM size? #3765

ChristopheBordieu opened this issue Nov 18, 2021 · 5 comments

Comments

@ChristopheBordieu
Copy link

ChristopheBordieu commented Nov 18, 2021

Hi team,

Since some days now, my OG PRD instance (that was running OK) is unable to finish the nightly indexing.
I have OOM error while indexing that happens let's say randomly: I have digged into the OG and Tomcat logs but found nothing special. So, in short, it seems that VM has reached its memory limit.

OG is 1.7.19 running Tomcat 10.0.4 / Java 11.0.9. VM is running RHEL 8.3 with 20 CPUs and 64G RAM. ~2000 projects for a total of ~33k Git repositories are indexed.

I had set 26G for Tomcat and 26G for indexing job before OG indexing starts crashing.
I have tried 20G Tomcat / 40G indexing but it failed: indexing crashed while trying to update configuration on Tomcat.

My question.
Is there a way to evaluate the minimal min and max JVM heap size needed for Tomcat, based on the contents of configuration file?
Is there a formula that could be used for a oom estimate using the number of projects, the number of repositories, the number of tags, the number of files indexed, the number of history cache files, ...?

Purpose is for me to finetune the min and max JVM heap size while OG is up and running and indexing OK, to monitor the VM and plan for a VM size upgrade.

@vladak
Copy link
Member

vladak commented Nov 24, 2021

Is this incremental or from-scratch reindex ? I assume this is not a per project reindex.

In overall, https://github.com/oracle/opengrok/wiki/Tuning-for-large-code-bases gives some tips on how to deal with JVM size setting. For production I'd recommend to run the web app with JVM monitoring and alerting based on JVM size, see e.g. https://github.com/OpenGrok/opengrok-monitoring-docker - the JVM Grafana dashboard has a preset alert just for that.

@vladak
Copy link
Member

vladak commented Nov 24, 2021

Quoting part of Tomcat's bin/setenv.sh from our production instance for reference on how tricky this is:

# OpenGrok memory boost to cover all-project searches
# (7 MB * 247 projects + 300 MB for cache should be enough)
# 64-bit Java allows for more so let's use 8GB to be on the safe side.
# We might need to allow more for concurrent all-project searches.
# However, with OpenGrok 1.1 the suggester requires more memory for
# each project (in one case the suggester footprint was 4.5 GB) 
# so bump the 8 GB to 16 GB to be on the safe side.
#
# 2020-11-24 got OOM exception even with 16 GB when rebuilding suggester
# of the userland-default-prepped project (larger one), raised further.
#
JAVA_OPTS="$JAVA_OPTS -Xmx48g"

So no matter how well thought the formula is, there will be always something that will make the heap jump higher. And I am not even considering #3541 or #1806.

My impression is that this is never ending battle that requires robust monitoring in place (both for heap size and request latencies) and occasional heap size readjustments.

@vladak
Copy link
Member

vladak commented Nov 24, 2021

As for the indexer JVM heap tuning, I'd be interested in an analysis of a heap dump (say with the MAT tool or YourKit).

Another thing from Tomcat's bin/setenv.sh we use in production:

JAVA_OPTS="$JAVA_OPTS -XX:+HeapDumpOnOutOfMemoryError"
JAVA_OPTS="$JAVA_OPTS -XX:HeapDumpPath=/data/jvm/"

This will dump the heap on JVM OOM.

Since the web app runs with 48 GB heap size, there needs to be enough space in /data/jvm. For the heap analysis I transfer the heap dump to my laptop, add a big chunk of swap space there (the thing has only 32 GB of RAM which is already occupied by things like web browser, IDEA instances etc. and the heap dump might be bigger than that), then adjust the JVM heap size of the tool itself (since it can itself run out of heap space when analyzing the heap dump :-D)), run it on the heap dump and hope the constant swapping will not burn the SSD too much until I get the results back. I wish there was a service where I could upload the heap dump (that has confidential data in it so it would have to be internal one) and it would do the analysis for me remotely.

@ChristopheBordieu
Copy link
Author

Hi @vladak

We run incremental reindexing nightly: we mirror all Git repos to be indexed (pull if already there, clone if new, delete if no more needed) then indexing is run on all repos and projects.

Up to now, we are not dumping indexer heap automatically on OOM error. Will eventually do that from now. But this is disk consuming and not so easy to analyze when available...

Since I have opened this ticket, we have upgraded the VM adding 64G RAM: VM has now 128G available. Tomcat is running with -Xmx26g like before and indexer with -Xmx64g to have margin. No more trouble now.

So, in short: no way to plan needed JVM heap resources for Tomcat and indexer, just keep monitoring both processes, and act when alerts are triggered.

@vladak
Copy link
Member

vladak commented Nov 24, 2021

Given this is incremental reindex, I'd be interested in heap dump analysis. Possibly there is something that can be optimized to reduce the memory footprint.

@vladak vladak added the indexer label Apr 25, 2022
@oracle oracle locked and limited conversation to collaborators Jun 1, 2022
@vladak vladak converted this issue into discussion #3956 Jun 1, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

3 participants