Skip to content

FAQ frequently asked questions

Alan Malta Rodrigues edited this page Apr 6, 2020 · 3 revisions

This wiki is meant to contain the most common questions and answers related to the Workload Management operations.

Just a reminder about the usual monitoring tools though:

Why are there so many workflows stuck in acquired state?

While there is no clear answer for such question, there is likely enough monitoring information to get to a conclusion.

From the monitoring links above, one can check the Condor pool summary link, go to the Site Table: table and check the last row of the IdleCpus column. Right now the value is 3723, so there are 3723 cpus that are free in the system, and the likely reason they are not used comes from the fact that (some) workflows are not properly dimensioned, sometimes taking more memory than the usual 2.5GB/core.

The WMAgent monitoring also has some interesting plots on this respect, especially those for "GQ elements by priority", for instance this one, which shows thousands of GQEs in Available above the 80k priority. This would answer why 80k workflows are not going through as well.

Clone this wiki locally