Skip to content
larsbutler edited this page Sep 3, 2012 · 3 revisions
  • engine

    The term "engine" refers to the OpenQuake Engine, the piece of software which is responsible for reading inputs, distributing calculations, and collecting outputs.

  • calculation

    A unique set of input files for which we attempt to produce results. Input files include an INI-style config file (containing various calculation parameters) and various XML files. The structure of these XML files is defined by our XML specification: NRML. The quantity, type, and content of these XML files will vary depending on the calculation mode.

    The terms calculation and job are often used synonymously, but it is important to understand the distinction.

  • job

    The runtime “thing” which represents an attempt to complete a calculation. The job contains various pieces of information which are relevant while a calculation is in progress, including:

    • status (i.e., “Is it running?”, as well as the calculation phase)
    • logging level
    • the user who initiated/owns this job
    • engine and supervisor process IDs

    If in the future calculation resumability is supported, there could be more than 1 job associated with a calculation.

  • calculator

    A Python class which can implement various methods associated with each calculation phase. In order of execution, the phases are:

    • pre_execute
    • execute (this is the only required method)
    • post_execute
    • post_process
    • export
    • clean_up

    Each calculation mode has its own calculator class defined, although many common functionalities are shared between calculators. This common functionality is abstracted into common base classes and plain functions.

  • task

    A task is a function which is passed a subset of the overall work to be completed for a calculation. Tasks are intended to be executed in asynchronous/parallel fashion and are independent from each other.

    Breaking a calculation into tasks in this manner allows the OpenQuake Engine to distribute a calculation workload among many worker machines.

  • control node

    The machine from which the user initiates a calculation. The control is responsible for running code which processes user input, initializes a calculation, distributes work, and manages/monitors the calculation until completion.

    By convention, the machine used as the control node does not process tasks. The control node is also typically the machine which runs vital server processes, such as RabbitMQ, Redis, and PostgreSQL/PostGIS.

  • worker process

    A process which is dedicated to task execution. Worker process are simply celeryd processes which are configured to execute OpenQuake Engine task code.

    For more information about Celery, see http://celeryproject.org/.

  • worker machine

    A machine dedicated to run worker processes.

    The concepts of “worker machines” and “worker processes” are often referred to synonymously (as “workers”, for short). Technically, there can be many worker processes running on a single worker machine. The number of worker processes per machine is typically equal to the number of CPU cores available.

  • supervisor

    A process which is launched when a calculation is started for the purpose of monitoring and controlling the engine process.

    The supervisor is responsible for recording job statistics (such as execution time), garbage collection of KVS data, collecting and consolidating log messages, monitoring the engine process, and responding to critical errors.

    The most critical roles of the supervisor are logging and error response. Because calculations can be carried out by the combined effort of many machines, it is important to be able to collect all log messages for all machines (control node and workers) and store them in a single place. To do this, workers transmit log messages through RabbitMQ (using AMQP). The supervisor collects all log messages, then either a) prints the messages to standard output or b) saves them to a file. Additionally, the supervisor sifts through incoming log messages and looks for ‘error’ or ‘critical’ level messages. If such a message is detected, the supervisor will kill the engine process, abort the calculation, record the failed status in the database, and clean up. At this time, the supervisor is not capable of halting tasks which are currently in progress, but there are mechanisms which exist to abort in-queue tasks in the case of a failure. (See openquake.utils.tasks.oqtask.)

  • NRML

    Natural hazards’ Risk Markup Language (NRML) is the officially supported input/output format for calculation artifacts. NRML includes custom set of XML schema definitions, as well as a number XML parsers and serializers for reading and writing NRML artifacts.

    See https://github.com/gem/nrml for more information.

  • nhlib

    New Hazard Library (nhlib) is the Python library the OpenQuake team has developed to function as the core scientific library behind the OpenQuake Engine.

    See https://github.com/gem/nhlib for more information.

Clone this wiki locally