Memory for PepQuery2 #30

mira-miracoli · 2023-07-11T13:56:53Z

Not 100% sure where to place this issue, but I thought it might be interesting for all users of the shared database. Otherwise I can of course move it to EU.

I am currently debugging an error the pepquery2 tool. The job errored because the JVM run out of memory.
When I tried to run the job locally I had to stop at 14G because my laptop (16G) started to lag.
I noticed that:

it uses 8 cores (from the logs) even though it was allocated 1 core
the index creation set is much slower on the server than on my laptop (might be storage related)
there is no rule for TPV (and was no rule for sortinghat)

I would like to change that, but I am not sure which values I should consider. In their documentation I found a recommendation for 8 GB of memory and 4 CPUs which is too little for at least the job I am looking at. When I tried to use gxadmin query tool-memory-per-inputs I found:

    id    |                                tool_id                                 | input_count | total_input_size_mb | mean_input_size_mb | median_input_size_mb | memory_used_mb | memory_used_per_input_mb | memory_mean_input_ratio | memory_median_input_ratio                                                                                                               
----------+------------------------------------------------------------------------+-------------+---------------------+--------------------+----------------------+----------------+--------------------------+-------------------------+--------------------------- 
 ######### | toolshed.g2.bx.psu.edu/repos/galaxyp/pepquery2/pepquery2/2.0.2+galaxy0 |           1 |                  36 |                 36 |                   36 |         283829 |            7948 |                    7948 |                      7948

While gxadmin report job-info returned the following:

## Destination Parameters                                                                                                                                                                    
                                                                                 
Key | Value                                                                                                                                                                                  
--- | ---
+Group | `""`
accounting_group_user | `#####`
description | `pepquery2`                                                                                                                                                                    
docker_memory | `3.8G`                                                    
metadata_strategy | `extended`                                                               
request_cpus | `1`                                                                                                                                                                           
request_memory | `3.8G`
requirements | `(GalaxyGroup  ==  "compute")`
submit_request_gpus | `0`

I am now trying to figure out how to implement a rule here and if we have to change something in the wrapper because of the CPU usage. Since I never used the tool myself I would be happy about any hints from people who have some experience with it.

The text was updated successfully, but these errors were encountered:

mira-miracoli · 2023-07-11T14:55:14Z

I increased the memory for the erroring job to 16G and it finished.
Since I have not enough data to come up with a sensible rule, I would suggest to set the mem to 16G for now

mira-miracoli · 2023-07-12T11:41:44Z

Unfortunately the tool seems not to be satisfied and condor complains it tries to exceed the 16G:

007 (44662823.000.000) 07/12 09:25:00 Shadow exception!
        Error from [email protected]: Job has gone over memory limit of 16384 megabytes. Peak usage: 16331 megabytes.
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
...
012 (44662823.000.000) 07/12 09:25:00 Job was held.
        Error from [email protected]: Job has gone over memory limit of 16384 megabytes. Peak usage: 16331 megabytes.
        Code 34 Subcode 0

The -Xmx16g was set, so this should not happen. However, since this job seems to need more than 16Gb, we could try to increase it further

mira-miracoli · 2023-07-12T12:25:56Z

I am not an Java expert, but I would assume, that there is some kind of overhead that adds to the 16GB that the JVM uses as heap

EDIT: This is what I learned so far. Since the wrapper currently defines -Xmx{mem}g, I need to change the wrapper accordingly

nuwang · 2023-07-13T02:40:50Z

Looks like it supports a -cpu parameter: http://www.pepquery.org/document.html#saparameter
and the wrapper will indeed need to be modified.

mira-miracoli · 2023-07-13T06:38:49Z

f no one opposes, I would open a PR with that set accordingly in _JAVA_OPTIONS

Looks like it supports a -cpu parameter: http://www.pepquery.org/document.html#saparameter and the wrapper will indeed need to be modified.

yes by default it uses all cores available to it, but it would be cleaner to use it, I guess.

mira-miracoli · 2023-07-14T07:53:37Z

I got this from Galaxy, probably, because I run je job manually and it was still watched by the Galaxy Handlers:

Job Metrics
cgroup

CPU Time | 2 hours and 54 minutes
-- | --
Failed to allocate memory count | 0E-7
Memory limit on cgroup (MEM) | 48.0 GB
Max memory usage (MEM) | 17.5 GB
Memory limit on cgroup (MEM+SWP) | 8.0 EB
Max memory usage (MEM+SWP) | 17.5 GB
OOM Control enabled | No
Was OOM Killer active? | No
Memory softlimit on cgroup | 0 bytes

...

Destination Parameters

Runner | condor
-- | --
Runner Job ID | 44667975
Handler | handler_sn06_3
+Group | ""
accounting_group_user | 55103
description | pepquery2
docker_memory | 16G
metadata_strategy | extended
request_cpus | 1
request_memory | 16G

mira-miracoli · 2023-07-14T11:53:27Z

The job is stopped by condor for exceeding its memory:

007 (xxxxxxxxx.000.000) 07/14 11:35:09 Shadow exception!                                                                                                                                      
        Error from slot1_8@vgcnbwc-worker-xxxxxxx: Job has gone over memory limit of 16384 megabytes. Peak usage: 16328 megabytes.                                            
        0  -  Run Bytes Sent By Job                                                                                       
        0  -  Run Bytes Received By Job                                                                                   
...                                                                                                                                                                                          
012 (xxxxxxxxxxxx.000.000) 07/14 11:35:09 Job was held.                                                                                                                                          
        Error from slot1_8@vgcnbwc-worker-xxxxxxxxl: Job has gone over memory limit of 16384 megabytes. Peak usage: 16328 megabytes.                                            
        Code 34 Subcode 0                                                                                                                                                                    
...                                                                                                                                                                                          
013 (xxxxxxxxxx.000.000) 07/14 11:45:02 Job was released.                                                                                                                                      
        via condor_release (by user galaxy)

Here is a PR to increase it

increase to 20G #32
And here for the Wrapper:
fix PepQuery2 memory galaxyproteomics/tools-galaxyp#722

mira-miracoli mentioned this issue Jul 11, 2023

add pepquery2 #31

Merged

This was referenced Jul 14, 2023

Increase Memory to 20GB usegalaxy-eu/tpv-shared-database#1

Closed

increase to 20G #32

Merged

fix PepQuery2 memory galaxyproteomics/tools-galaxyp#722

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory for PepQuery2 #30

Memory for PepQuery2 #30

mira-miracoli commented Jul 11, 2023 •

edited

Loading

mira-miracoli commented Jul 11, 2023

mira-miracoli commented Jul 12, 2023 •

edited

Loading

mira-miracoli commented Jul 12, 2023 •

edited

Loading

nuwang commented Jul 13, 2023

mira-miracoli commented Jul 13, 2023

mira-miracoli commented Jul 14, 2023

mira-miracoli commented Jul 14, 2023

Memory for PepQuery2 #30

Memory for PepQuery2 #30

Comments

mira-miracoli commented Jul 11, 2023 • edited Loading

mira-miracoli commented Jul 11, 2023

mira-miracoli commented Jul 12, 2023 • edited Loading

mira-miracoli commented Jul 12, 2023 • edited Loading

nuwang commented Jul 13, 2023

mira-miracoli commented Jul 13, 2023

mira-miracoli commented Jul 14, 2023

mira-miracoli commented Jul 14, 2023

mira-miracoli commented Jul 11, 2023 •

edited

Loading

mira-miracoli commented Jul 12, 2023 •

edited

Loading

mira-miracoli commented Jul 12, 2023 •

edited

Loading