-
Notifications
You must be signed in to change notification settings - Fork 107
FAQ frequently asked questions
This wiki is meant to contain the most common questions and answers related to the Workload Management operations.
Just a reminder about the usual monitoring tools though:
- WMAgent monitoring: https://monit-grafana.cern.ch/d/lhVKAhNik/cms-wmagent-monitoring?from=now-2d&orgId=11&refresh=5m&to=now
- WMCore Workflow monitoring: https://cmsweb.cern.ch/wmstats/index.html
- CMS Job monitoring: MonIT_JobMonitoring
- Job/Condor pool monitoring: https://cms-gwmsmon.cern.ch/
- Production Condor pool summary: http://cms-htcondor-monitor.t2.ucsd.edu/letts/production.html
While there is no clear answer for such question, there is likely enough monitoring information to get to a conclusion.
From the monitoring links above, one can check the Condor pool summary link, go to the Site Table:
table and check the last row of the IdleCpus
column. Right now the value is 3723, so there are 3723 cpus that are free in the system, and the likely reason they are not used comes from the fact that (some) workflows are not properly dimensioned, sometimes taking more memory than the usual 2.5GB/core.
The WMAgent monitoring also has some interesting plots on this respect, especially those for "GQ elements by priority", for instance this one, which shows thousands of GQEs in Available above the 80k priority. This would answer why 80k workflows are not going through as well.