Reduce default memory allocation to the java process #1407

amahussein · 2024-10-31T17:11:05Z

This pull request introduces several changes to improve the handling of JVM heap size and thread calculations in the spark_rapids_tools module. The most important changes include updating the method for calculating JVM heap size.

This change aims at avoiding allocating memory by default that would trigger the OOM-killer

use available memory instead of total.
cap the xmx to 32 GB
cap the max number of threads to 8

Enhancements to JVM heap size and thread calculations:

user_tools/src/spark_rapids_tools/cmdli/argprocessor.py: Updated the method to calculate JVM heap size using Utilities.calculate_jvm_max_heap_in_gb() instead of Utilities.get_system_memory_in_gb(). Increased the minimum heap size per thread from 6 GB to 8 GB.
user_tools/src/spark_rapids_tools/utils/util.py: Added class variables min_jvm_xmx, max_jvm_xmx, and max_tools_threads to set limits on JVM heap size and the number of threads.

Method renaming for clarity:

user_tools/src/spark_rapids_tools/utils/util.py: Renamed get_system_memory_in_gb() to calculate_jvm_max_heap_in_gb() and updated the method to calculate the maximum heap size based on available system memory, capping it between 8 GB and 32 GB.
user_tools/src/spark_rapids_tools/utils/util.py: Renamed get_max_jvm_threads() to calculate_max_tools_threads() and updated the method to calculate the maximum number of threads based on physical cores, capping it at 8 threads.
user_tools/src/spark_rapids_tools/utils/util.py: Updated the adjust_tools_resources method to use the new calculate_max_tools_threads() method for determining the maximum number of threads.

Signed-off-by: Ahmed Hussein <[email protected]> Fixes NVIDIA#1406

Signed-off-by: Ahmed Hussein <[email protected]>

parthosa

Thanks @amahussein.

Reduce default memory allocation to the java process

f9874dd

Signed-off-by: Ahmed Hussein <[email protected]> Fixes NVIDIA#1406

amahussein added bug Something isn't working user_tools Scope the wrapper module running CSP, QualX, and reports (python) labels Oct 31, 2024

amahussein requested a review from parthosa October 31, 2024 17:11

amahussein self-assigned this Oct 31, 2024

change cap of the xmx to 32 GB

13599ef

Signed-off-by: Ahmed Hussein <[email protected]>

tgravescs approved these changes Oct 31, 2024

View reviewed changes

parthosa approved these changes Oct 31, 2024

View reviewed changes

amahussein merged commit e1c4742 into NVIDIA:dev Oct 31, 2024
14 checks passed

amahussein deleted the rapids-tools-1406 branch October 31, 2024 21:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce default memory allocation to the java process #1407

Reduce default memory allocation to the java process #1407

amahussein commented Oct 31, 2024 •

edited

Loading

parthosa left a comment

Reduce default memory allocation to the java process #1407

Reduce default memory allocation to the java process #1407

Conversation

amahussein commented Oct 31, 2024 • edited Loading

parthosa left a comment

Choose a reason for hiding this comment

amahussein commented Oct 31, 2024 •

edited

Loading