[tpv_db_optimizer] Automatically update resource requests based on u… #64

nuwang · 2024-06-24T03:47:54Z

…segalaxy historical data.

Needs manual vetting.

This repository contains the code for generating automated updates: https://github.com/nuwang/tpv-db-optimizer
Once we refine the details, we can transfer the repo over to the galaxyproject.

These scripts rely on the existence of some materialized views in the usegalaxy.* databases, which are defined here:
https://github.com/nuwang/tpv-db-optimizer/blob/0cc2ba687b709120295476fd7da1f1500a551bdd/views.py#L2

Therefore, the script is currently run on a consolidated database which was manually assembled. Instead, it would probably be better if we could run it directly against usegalaxy.* databases, assuming the materialized views exist and are periodically updated.

These views could potentially be used for gxy.io/KUI as well.

…segalaxy historical data

sanjaysrikakulam · 2024-08-08T12:36:33Z

At first glance, I felt:

I think the default mem should be at least 3.8G: The max and the adjusted_mem both should have the range above the default mem (something like 3.8 or 4 GB). I think allocating mem to be less than that might warrant unforeseen errors and issues, and having a 1 or 2G buffer is good in all cases in my opinion. Most of the changes seem to set mem to <1 or 2G.
The mem changes made to some of the existing (ones already in the TPV shared DB) tools look ok, as most are close to what was already defined. However, there are a few exceptions where the new settings seem super low (3 to 4 times low), and I am a bit suspicious of that. For example, in signalp3, the change is from 10G to 0.67G (several others are also in the same range). Similarly, we must double-check for the schicexplorer_* tools. I think by setting the minimum mem to 3.8G, we can give them a try and increase when necessary.

bgruening

Thanks a lot @nuwang ... I tried adding a few comments and flagged a few tools that I find suspicious. Not sure how we should proceed with those.

bgruening · 2024-08-08T13:42:02Z

tools.yml

  testtoolshed.g2.bx.psu.edu/repos/simon-gladman/phyloseq_filter/biom_filter/.*:
-    mem: 60.8
+    mem: 58.16


@nuwang are those values the RAW values? Or did you already added some wiggle?

I would propose to round it up a bit and maybe add a comment # raw recom: 58.16, or something similar.

Yes, there's 5% wiggle room: https://github.com/nuwang/tpv-db-optimizer/blob/0cc2ba687b709120295476fd7da1f1500a551bdd/mem-optimize.py#L132

bgruening · 2024-08-08T13:46:15Z

tools.yml

@@ -157,13 +268,13 @@ tools:
    mem: 60
  toolshed.g2.bx.psu.edu/repos/bgruening/hicexplorer_chicqualitycontrol/hicexplorer_chicqualitycontrol/.*:
    cores: 20
-    mem: 60
+    mem: 9.01


This is interesting ... I'm a bit sceptical about that, but maybe we should try.

bgruening · 2024-08-08T13:52:31Z

tools.yml

@@ -271,24 +400,46 @@ tools:
    cores: 16
    mem: 36
  toolshed.g2.bx.psu.edu/repos/bgruening/rdock_sort_filter/rdock_sort_filter/.*:
-    mem: 90
+    mem: 4.67


I would assume we had a reason for this. Do we take the filesize into account? I'm surprised, but I guess we just try again.

bgruening · 2024-08-08T13:52:49Z

tools.yml

  toolshed.g2.bx.psu.edu/repos/bgruening/rxdock_sort_filter/rxdock_sort_filter/.*:
-    mem: 90
+    mem: 4.67


bgruening · 2024-08-08T13:53:53Z

tools.yml

+    mem: 2.51
+  toolshed.g2.bx.psu.edu/repos/devteam/cummerbund_to_tabular/cummerbund_to_cuffdiff/.*:
+    mem: 0.48
+  toolshed.g2.bx.psu.edu/repos/devteam/data_manager_bowtie2_index_builder/bowtie2_index_builder_data_manager/.*:


I would take the max for data_managers, or create rules.

bgruening · 2024-08-08T17:05:36Z

tools.yml

@@ -780,7 +1248,7 @@ tools:
    mem: 8
  toolshed.g2.bx.psu.edu/repos/galaxyp/openms_idfileconverter/IDFileConverter/.*:
    cores: 4
-    mem: 8
+    mem: 0.64


Flagging this one ... looks abnormal.

bgruening · 2024-08-08T17:06:03Z

tools.yml

@@ -898,7 +1372,7 @@ tools:
    cores: 4
    mem: 8
  toolshed.g2.bx.psu.edu/repos/galaxyp/openms_openswathworkflow/OpenSwathWorkflow/.*:
-    mem: 156
+    mem: 14.01


outch, I guess this one needs a rule ?

bgruening · 2024-08-08T17:10:05Z

tools.yml

@@ -2342,29 +3255,43 @@ tools:
    mem: 20
  toolshed.g2.bx.psu.edu/repos/iuc/vardict_java/vardict_java/.*:
    cores: 2
-    mem: 128
+    mem: 36.5


we had a reason for this ... if I would just remember. A rule is probably needed here.

bgruening · 2024-08-08T17:10:24Z

tools.yml

@@ -2406,22 +3383,171 @@ tools:
  toolshed.g2.bx.psu.edu/repos/peterjc/seq_select_by_id/seq_select_by_id/.*:
    mem: 8
  toolshed.g2.bx.psu.edu/repos/peterjc/tmhmm_and_signalp/signalp3/.*:
-    mem: 10
+    mem: 0.67


bgruening · 2024-08-08T17:10:42Z

tools.yml

@@ -2441,35 +3569,63 @@ tools:
    cores: 10
    mem: 20
  toolshed.g2.bx.psu.edu/repos/rnateam/paralyzer/paralyzer/.*:
-    mem: 8
+    mem: 0.59


nuwang · 2024-08-08T18:11:16Z

Thanks @sanjaysrikakulam and @bgruening for reviewing, this feedback is super useful to identify cases to zoom into. Currently, the memory usage is calculated based on the max recorded cgroup memory usage across the past year. Only one year was used because older cgroup metrics are quite iffy - older ones seem to capture the entire node etc. etc. yielding incorrect results.

The ones that have dramatically lower max values than recorded do need further investigation, but at least in my previous spot checks, were mostly correct at last as far as the cgroup metrics were concerned. I will check the ones you have flagged in greater detail and report back.

I think there's still a fair bit more work to do to make sure we are recording correct metrics across the entire federation for cpu, memory and if possible, perhaps even gpu. We should also merge this oustanding PR soon so we can more easily track actual tpv mem allocations: https://github.com/galaxyproject/tpv-shared-database/pull/63/files
If we can get these issues fixed, we should be able to enhance the scripts for CPU optimization as well I think, and rerun say mid to late next year with even more data.

nuwang · 2024-08-08T18:35:52Z

At first glance, I felt:

I think the default mem should be at least 3.8G: The max and the adjusted_mem both should have the range above the default mem (something like 3.8 or 4 GB). I think allocating mem to be less than that might warrant unforeseen errors and issues, and having a 1 or 2G buffer is good in all cases in my opinion. Most of the changes seem to set mem to <1 or 2G.

I agree with your point - we don't want to cause mass failures and have a huge admin headache. Perhaps we can take a more incremental approach and experimentally lower just a few tools and see how well we do. The problem with a minimum allocation of say 4GB is that frequently used tools that actually don't require much memory result in substantial wastage in aggregate.

The mem changes made to some of the existing (ones already in the TPV shared DB) tools look ok, as most are close to what was already defined. However, there are a few exceptions where the new settings seem super low (3 to 4 times low), and I am a bit suspicious of that. For example, in signalp3, the change is from 10G to 0.67G (several others are also in the same range). Similarly, we must double-check for the schicexplorer_* tools. I think by setting the minimum mem to 3.8G, we can give them a try and increase when necessary.

Agree, those are really weird. One possibility that comes to mind is that the captured metrics are somehow wrong. Another perhaps is that the tool just wasn't used that much during the past year, and hence whatever it was used for didn't use much memory. I'll investigate the one's you've both flagged and see what the older data says across federation data.

sanjaysrikakulam · 2024-08-13T09:32:15Z

At first glance, I felt:

I think the default mem should be at least 3.8G: The max and the adjusted_mem both should have the range above the default mem (something like 3.8 or 4 GB). I think allocating mem to be less than that might warrant unforeseen errors and issues, and having a 1 or 2G buffer is good in all cases in my opinion. Most of the changes seem to set mem to <1 or 2G.

I agree with your point - we don't want to cause mass failures and have a huge admin headache. Perhaps we can take a more incremental approach and experimentally lower just a few tools and see how well we do. The problem with a minimum allocation of say 4GB is that frequently used tools that actually don't require much memory result in substantial wastage in aggregate.

An incremental approach would be a nice way to move forward. I agree about the aggregate wastage that may arise by setting a minimum allocation of 4GB, for example. Maybe we can start with a minimum of 2.5 GB allocation and then increase if and when necessary, case by case.

The mem changes made to some of the existing (ones already in the TPV shared DB) tools look ok, as most are close to what was already defined. However, there are a few exceptions where the new settings seem super low (3 to 4 times low), and I am a bit suspicious of that. For example, in signalp3, the change is from 10G to 0.67G (several others are also in the same range). Similarly, we must double-check for the schicexplorer_* tools. I think by setting the minimum mem to 3.8G, we can give them a try and increase when necessary.

Agree, those are really weird. One possibility that comes to mind is that the captured metrics are somehow wrong. Another perhaps is that the tool just wasn't used that much during the past year, and hence whatever it was used for didn't use much memory. I'll investigate the one's you've both flagged and see what the older data says across federation data.

Yup, I had the same thought regarding the metrics. I was unaware that the calculations were limited to last year's data. It could be because of either of these.

Thank you!

sanjaysrikakulam · 2024-08-13T12:18:12Z

@nuwang What do you think of the following idea?

Address all the flagged entries
Merge this into a new branch to test it in production. EU can deploy it live and see what happens. Because we feel that any number we set as "minimum" (2.5GB or 4GB) will still be arbitrary, and rather than adding more delays to this PR, we can try it out for a week. I also think that most of these "new" entries probably used the default mem 3.8G defined in the tool_defaults (I compared a few new entries to what we have in our TPV and see that we have not defined any explicit mem for these, and for some, we do have explicit rules; at the same, there are so many new entries that are being added in this PR that we do not have in our TPV, meaning they are using the tool_defaults)

nuwang · 2024-08-13T16:36:47Z

@sanjaysrikakulam That sounds good to me. It'll take me a bit of time to address No. 1, because of some other pressing commitments, but I'll ping you when it's done. I like the plan you've outlined in No. 2. One thing - we should record the date we go live so that we can track actual vs expected gains. I'm hoping we can collectively author a paper on this.

bgruening · 2024-08-13T16:45:38Z

@nuwang another thing that I wanted to mention but always forget ... when we looked last time at this ... and figured that our c-group values are off ... we integrated https://usegalaxy.eu/?tool_id=stress_ng ... with this tool you can request memory and CPU exactly. Our idea was to run this tool with different params and check what cgroups and Galaxy reports.

Maybe that is also useful for you.

sanjaysrikakulam · 2024-08-14T06:06:27Z

@sanjaysrikakulam That sounds good to me. It'll take me a bit of time to address No. 1, because of some other pressing commitments, but I'll ping you when it's done.

Sure! Let me know.

I like the plan you've outlined in No. 2. One thing - we should record the date we go live so that we can track actual vs expected gains. I'm hoping we can collectively author a paper on this.

Certainly! We need to track and monitor. A paper will be great, I look forward to it!

sanjaysrikakulam · 2024-08-14T06:23:11Z

@nuwang another thing that I wanted to mention but always forget ... when we looked last time at this ... and figured that our c-group values are off ... we integrated https://usegalaxy.eu/?tool_id=stress_ng ... with this tool you can request memory and CPU exactly. Our idea was to run this tool with different params and check what cgroups and Galaxy reports.

Maybe that is also useful for you.

I tried stress_ng for whatever reason, the job_metric_numeric is not recording the mem usage,

Galaxy command_line: stress-ng --timeout '10s' --cpu '4' --cpu-ops '4' --fork '4' --hdd '4' --vm '4' '' '' '' '' '' 2>&1 | tee "/data/jwd02f/main/072/458/72458662/outputs/dataset_f777687d-8a13-4567-acb3-6669c63cc40e.dat

galaxy=> select * from job_metric_numeric where job_id=72458662;
    id     |  job_id  | plugin |     metric_name      |    metric_value
-----------+----------+--------+----------------------+--------------------
 872402152 | 72458662 | core   | galaxy_slots         |          1.0000000
 872402153 | 72458662 | core   | galaxy_memory_mb     |       3891.0000000
 872402154 | 72458662 | core   | start_epoch          | 1723616153.0000000
 872402155 | 72458662 | core   | end_epoch            | 1723616205.0000000
 872402156 | 72458662 | core   | runtime_seconds      |         52.0000000
 872402157 | 72458662 | cgroup | cpu.stat.usage_usec  |   65951046.0000000
 872402158 | 72458662 | cgroup | cpu.stat.user_usec   |   41643975.0000000
 872402159 | 72458662 | cgroup | cpu.stat.system_usec |   24307071.0000000

I am not sure why this could be. Also, I do not see the Job Metrics section in the Galaxy. Also, I think the TPV conf needs to be updated for this tool to adopt the dynamic allocation of the cores and mem based on user input because it currently uses the TPV tool defaults (1 core and 3.8G mem).

-- Edit --
I ran the job for a longer duration (20 minutes) in that case I can see the Job Metrics section in the Galaxy UI for that job. However, the Cgroup memory is still not getting recorded. Could it be a bug? However, I see the Cgroup snippet in the galaxy_<job_id>.sh script.

Cgroup Snippet from the galaxy job script.

if [ -e "/proc/$$/cgroup" -a -d "/sys/fs/cgroup" -a ! -f "/sys/fs/cgroup/cgroup.controllers" ]; then
    cgroup_path=$(cat "/proc/$$/cgroup" | awk -F':' '($2=="cpuacct,cpu") || ($2=="cpu,cpuacct") {print $3}');
    
    if [ ! -e "/sys/fs/cgroup/cpu$cgroup_path/cpuacct.usage" ]; then
        cgroup_path="";
    fi;
    
    for f in /sys/fs/cgroup/{cpu\,cpuacct,cpuacct\,cpu}$cgroup_path/{cpu,cpuacct}.*; do
        if [ -f "$f" ]; then
            echo "__$(basename $f)__" >> /data/jwd05e/main/072/002/72002437/__instrument_cgroup__metrics;
            cat "$f" >> /data/jwd05e/main/072/002/72002437/__instrument_cgroup__metrics 2>/dev/null;
        fi;
    done;
    
    cgroup_path=$(cat "/proc/$$/cgroup" | awk -F':' '$2=="memory"{print $3}');
    
    if [ ! -e "/sys/fs/cgroup/memory$cgroup_path/memory.max_usage_in_bytes" ]; then
        cgroup_path="";
    fi;
    
    for f in /sys/fs/cgroup/memory$cgroup_path/memory.*; do
        echo "__$(basename $f)__" >> /data/jwd05e/main/072/002/72002437/__instrument_cgroup__metrics;
        cat "$f" >> /data/jwd05e/main/072/002/72002437/__instrument_cgroup__metrics 2>/dev/null;
    done;
fi

if [ -e "/proc/$$/cgroup" -a -f "/sys/fs/cgroup/cgroup.controllers" ]; then
    cgroup_path=$(cat "/proc/$$/cgroup" | awk -F':' '($1=="0") {print $3}');
    
    for f in /sys/fs/cgroup/${cgroup_path}/{cpu,memory}.*; do
        echo "__$(basename $f)__" >> /data/jwd05e/main/072/002/72002437/__instrument_cgroup__metrics;
        cat "$f" >> /data/jwd05e/main/072/002/72002437/__instrument_cgroup__metrics 2>/dev/null;
    done;
fi

We have CGroups V2 in use on the workers. Since we do not store the JWDs of successful jobs in the EU, I will test this on the ESG Galaxy instance to see whether the file __instrument_cgroup__metrics gets created. If not, I will create an issue upstream, and we can continue the discussion there.

nuwang · 2024-08-14T15:21:35Z

Thanks both for pointing to this. For slurm, it's also necessary to configure cgroup settings so that a new cgroup is created before executing the job: usegalaxy-au/infrastructure#1374. I imagine something similar is necessary for HTCondor? You're still using that right?

[tpv_auto_optimize] Automatically update resource requests based on u…

3b292c2

…segalaxy historical data

nuwang changed the title ~~[tpv_auto_optimize] Automatically update resource requests based on u…~~ [tpv_db_optimizer] Automatically update resource requests based on u… Jun 29, 2024

nuwang requested a review from bgruening July 28, 2024 12:11

Merge branch 'main' into db_auto_optimize

deae002

nuwang requested review from cat-bro, sanjaysrikakulam and mira-miracoli August 6, 2024 04:30

bgruening reviewed Aug 8, 2024

View reviewed changes

sanjaysrikakulam mentioned this pull request Sep 4, 2024

Change condor base cgroup path usegalaxy-eu/vgcn-infrastructure#309

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tpv_db_optimizer] Automatically update resource requests based on u… #64

[tpv_db_optimizer] Automatically update resource requests based on u… #64

nuwang commented Jun 24, 2024 •

edited

Loading

sanjaysrikakulam commented Aug 8, 2024

bgruening left a comment

bgruening Aug 8, 2024

nuwang Aug 8, 2024 •

edited

Loading

bgruening Aug 8, 2024

bgruening Aug 8, 2024

bgruening Aug 8, 2024

bgruening Aug 8, 2024

bgruening Aug 8, 2024

bgruening Aug 8, 2024

bgruening Aug 8, 2024

bgruening Aug 8, 2024

bgruening Aug 8, 2024

nuwang commented Aug 8, 2024 •

edited

Loading

nuwang commented Aug 8, 2024 •

edited

Loading

sanjaysrikakulam commented Aug 13, 2024

sanjaysrikakulam commented Aug 13, 2024 •

edited

Loading

nuwang commented Aug 13, 2024

bgruening commented Aug 13, 2024

sanjaysrikakulam commented Aug 14, 2024

sanjaysrikakulam commented Aug 14, 2024 •

edited

Loading

nuwang commented Aug 14, 2024 •

edited

Loading

[tpv_db_optimizer] Automatically update resource requests based on u… #64

Are you sure you want to change the base?

[tpv_db_optimizer] Automatically update resource requests based on u… #64

Conversation

nuwang commented Jun 24, 2024 • edited Loading

sanjaysrikakulam commented Aug 8, 2024

bgruening left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nuwang Aug 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nuwang commented Aug 8, 2024 • edited Loading

nuwang commented Aug 8, 2024 • edited Loading

sanjaysrikakulam commented Aug 13, 2024

sanjaysrikakulam commented Aug 13, 2024 • edited Loading

nuwang commented Aug 13, 2024

bgruening commented Aug 13, 2024

sanjaysrikakulam commented Aug 14, 2024

sanjaysrikakulam commented Aug 14, 2024 • edited Loading

nuwang commented Aug 14, 2024 • edited Loading

nuwang commented Jun 24, 2024 •

edited

Loading

nuwang Aug 8, 2024 •

edited

Loading

nuwang commented Aug 8, 2024 •

edited

Loading

nuwang commented Aug 8, 2024 •

edited

Loading

sanjaysrikakulam commented Aug 13, 2024 •

edited

Loading

sanjaysrikakulam commented Aug 14, 2024 •

edited

Loading

nuwang commented Aug 14, 2024 •

edited

Loading