Skip to content

Resource Configuration

fernanqv edited this page Mar 21, 2022 · 2 revisions

The configuration file resources.conf is used to describe computing resources. When you start WRF4G, resources.conf file is copied under ~/.wrf4g/etc directory if it does not exist. The file can be edit directly or by executing wrf4g resource edit command.

WRF4G relies on DRM4G to manage computing resources. Further information about how to configure computing resources can be found in the DRM4G Documentation

Configuration format

The configuration resource file consists of sections, each led by a [section] header, followed by key = value entries. Lines beginning with # are ignored. Allowing sections are [DEFAULT] and [resource_name].

The DEFAULT section provides default values for all other resource sections.

The definition of each resource section has to begin with the line [resource_name] followed by key = value entries.

Configuration keys common to all resources:

  • enable: true or falses in order to enable or disable a resource.
  • communicator or authentication type :
    • local: The resource will be accessed directly.
    • ssh: By default, the resource will be accessed through ssh's protocol via Paramiko's API.
    • pk_ssh: The resource will be accessed through ssh's protocol via Paramiko's API.
    • op_ssh: The resource will be accessed through OpenSSH's CLI.
  • username: Name of the user that will be used to log on to the front-end.
  • frontend: Hostname or ip address of either the cluster or grid user interface you'll be connected to. The syntax is "host:port" and by default the port used is 22
  • private_key: Path to the identity file needed to log on to the front-end
  • public key: Path to the public identity file needed to log on to the front-end.
    • OPTIONAL: by default the '''private_key''''s value will be taken, to which '''.pub''' will be added)
  • scratch: Directory used to store temporary files for jobs during their execution, by default, it is $HOME/.drm4g/jobs. When the communicator used is ssh, this key is mandatory, unless the $HOME folder name is exactly the same in the submitting machine and in the remote resource.
  • local_scratch: Job's working directory on the worker nodes, by default, it is $HOME/.wrf4g/jobs.
  • lrms or Local Resource Management System :
    • pbs: TORQUE/PBS cluster.
    • sge: Grid Engine cluster.
    • loadleveler: !LoadLeveler cluster.
    • lsf: LSF cluster.
    • fork: SHELL.
    • slurm: SLURM cluster.
    • slurm_res: RES(Red Española de Supercomputación) resources.

Keys for HPC resources

  • queue: Queue available on the resource. If there are several queues, you have to use a "," as follows "queue = short,medium,long".
  • max_jobs_in_queue: Max number of jobs in the queue.
  • max_jobs_running: Max number of running jobs in the queue.
  • parallel_env: It defines the parallel environments available for Grid Engine cluster.
  • project: It specifies the project variable and is for TORQUE/PBS, Grid Engine and LSF clusters.

Examples

By default, WRF4G is going to use the local machine as fork lrms:

[localmachine]
enable            = true
communicator      = local
frontend          = localhost
lrms              = fork
max_jobs_running  = 1

TORQUE/PBS cluster, accessed through ssh protocol:

[meteo]
enable            = true
communicator      = ssh
username          = user
frontend          = mar.meteo.unican.es
private_key       = ~/.ssh/id_rsa
lrms              = pbs
queue             = short, medium, long
max_jobs_running  = 2, 10, 20
max_jobs_in_queue = 6, 20, 40

SGE cluster, accessed through ssh protocol:

[blizzard]
enable            = true
communicator      = op_ssh
username          = user
frontend          = blizzard.meteo.unican.es
private_key       = ~/.ssh/id_rsa
parallel_env      = mpi
lrms              = sge
queue             = long
max_jobs_running  = 20
max_jobs_in_queue = 40