-
Notifications
You must be signed in to change notification settings - Fork 155
Ganga Monitoring Service Guide
Ganga's monitoring service is built using the asyncio service with the aim of minimizing the amount of time needed to monitor jobs and to achieve maximum concurrency benefits.
The monitoring service is in essence a background thread which runs an asyncio event loop.
The service maintains an internal registry of jobs which require monitoring. These are jobs which are in the submitted
or running
state. This registry is updated by a task that runs periodically in the event loop. The interval in which this task runs can be configured via the config['PollThread']['base_poll_rate']
option.
Jobs are grouped and checked per backend. The poll rate for each backend can be individually configured under config['PollThread'][BACKEND_NAME]
. If a backend specific option does not exist in Ganga's configuration, then the default poll rate which can be found under [config]['PollThread']['default_backend_poll_rate']
will be used.
The monitoring service works under the hood by calling each backend's monitoring method with the slice of jobs which are currently executing in that backend and require status retrieval.
This method needs to use the following signature:
async def updateMonitoringInformation(jobs)
The above is the only requirement for a valid backend monitoring method.
Even though the service is based on asyncio, it supports both async and sync monitoring methods. At runtime, it checks whether the called method is an asynchronous coroutine, and if it is not, it is executed inside a Thread Pool.
So in the previously mentioned method definition, the async
part of the definition can in fact be omitted. However, it is recommended to use asynchronous monitoring when available to achieve the best performance.
The monitoring thread is activated by default when in interactive mode and will continue running until the session ends or when it is explicitly requested via disableMonitoring()
.
On the other hand, the monitoring thread is disabled when running Ganga in script mode. It can be enabled and run in the background using enableMonitoring()
. Alternatively, on demand monitoring of a job slice can be requrested via runMonitoring(jobs)
DIRAC monitoring is a special case due to the need of execution in a different environment when working within the LHCb experiment Python suite.
For the above reason, all DIRAC monitoring happens within an isolated subprocess which uses the appropriate environment. In order for DIRAC monitoring tasks to run concurrently, the aforementioned subprocess uses a thread pool to execute them.
The subprocess receives tasks from the monitoring service using a shared task queue and outputs the results to a shared dictionary. All of the above are described in the schematic below.