Skip to content

CMS Monitoring validation proposal

Valentin Kuznetsov edited this page Feb 5, 2019 · 3 revisions

CMS Monitoring validation

This document describes a proposal of uniform validation of all CMS documents send to CERN MONIT system. We describe a proposal how to enforce validation of CMS documents from all CMS sub-system.

Current setup

At the moment there is no common validation procedure exists in CMS to send documents to CERN MONIT infrastructure. But we do rely on common interface for sending documents. It is based on Stomp AMQ mechanism. We do have sparse StompAMQ modules around in CMS code which are under consolidation, see this discussion [1] and corresponding PR [2]. All documents we send to CERN MONIT system are represented in JSON data-format, most of them have flat structure, i.e. represented as key:value pairs of the JSON, while other may have nested structure, e.g. dictionary holding other dictionaries. Based on this we can consolidate a validation layer under StompAMQ module, i.e. at the time we send docs to CERN MONIT, based on schema definitions provided and maintained by data-providers.

Proposal

  • we should enforce adaptation of common StompAMQ layer [2] in all CMS infrastructure, i.e. all tools which communicate with CERN MONIT system should use this layer.
  • we introduce a common abstract interface [3] which can be extended by individual sub-systems based on their schema definition
  • we consolidate all sub-system schemas in common repository [4] as json documents representing sub-system document. The schema should contain keys and values of certain type we suppose send to MONIT in that sub-system, e.g. {"attribute1": "string", "attribute2": 1} here attribute1 has a string type, while attribute2 has int data-type.
  • we add validation calls from [3] into StompAMQ module before we process every document and send it to CERN MONIT system. The validation algorithm should take O(1) time since it is based on dictionary (hash table) data look-up and comparison of data type. If document does not pass the validation the appropriate error will be thrown (in case of Python we will throw the exception). If necessary the validation API will be written as Python C-extension module to speed up the process.

To implement this proposal the following steps should be accomplished:

  • consolidate StompAMQ codebase
  • enforce usage of common StompAMQ module in all sub-systems
  • request every sub-system to provide a schema for their documents and submit them to CMSMonitoring repository [4]
  • write general purpose validator for generic dictionary document
  • if certain validation steps will be required by sub-system it can inherit generic validation API and implement their own logic
  • add validation API to StompAMQ codebase
  • inform clients of StompAMQ module to handle exception which may be thrown by validation API

Examples

This section contains basic examples how to validate a document against proper schema. To do that we'll use dmwm/CMSMonitoring package:

from CMSMonitoring.Validator import validate_schema
...
doc = {}    # an input document which you want to validate, it is python dictionary
schema = {} # a sub-system schema document, a python dictionary
# perform validation
if not validate_schema(doc, schema):
   raise Exception('document {} does not match its schema'.format(doc, schema))

You can find examples of sub-system schemas in dmwm/CMSMonitoring/schemas

They represent a typical document which we already store into CERN MONIT, all of them are taken from CERN MONIT ES instance.

References

  1. https://its.cern.ch/jira/browse/CMSMONIT-65
  2. https://github.com/dmwm/WMCore/pull/8974
  3. https://github.com/dmwm/CMSMonitoring/blob/master/src/python/CMSMonitoring/Validator.py
  4. https://github.com/dmwm/CMSMonitoring/tree/master/schemas
Clone this wiki locally