-
Notifications
You must be signed in to change notification settings - Fork 4
Tutorial
DRM4G
is an open platform, based on GridWay, used to define, submit, and manage computational jobs. DRM4G
is a Python ('''3.5+''') implementation that provides a single point of control for computing resources without installing any intermediate middlewares. As a result, a user is able to run any job on laptops, desktops, workstations, clusters, supercomputers, and any grid.
In order to install DRM4G, follow this instructions
- Start up DRM4G:
[user@mycomputer~]$ drm4g start
Checking DRM4G local configuration ...
Creating a DRM4G local configuration in '/home/user/.drm4g'
Copying from '/home/user/drm4g/etc' to '/home/user/.drm4g/etc'
Starting DRM4G ....
OK
Starting ssh-agent ...
OK
- Show information about all available resources, their hosts and their queues :
[user@mycomputer~]$ drm4g resource list
RESOURCE STATE
localmachine enabled
[user@mycomputer~]$ drm4g host list
HID ARCH JOBS(R/T) LRMS HOST
0 x86_64 0/0 fork localmachine
[user@mycomputer~]$ drm4g host list 0
HID ARCH JOBS(R/T) LRMS HOST
0 x86_64 0/0 fork localmachine
QUEUENAME JOBS(R/T) WALLT CPUT MAXR MAXQ
default 0/0 0 0 1 1
Create a job template:
[user@mycomputer~]$ echo "EXECUTABLE=/bin/date" > date.job
Submit the job:
[user@mycomputer~]$ drm4g job submit date.job
ID: 0
Check the evolution of the job:
[user@mycomputer~]$ drm4g job list 0
JID DM EM START END EXEC XFER EXIT NAME HOST
0 pend ---- 19:39:09 --:--:-- 0:00:00 0:00:00 -- date.job --
If you execute successive drm4g job list 0
, you will see the different states of this job:
-
pend
: The job is pending for a host to run on.
JID DM EM START END EXEC XFER EXIT NAME HOST
0 pend ---- 19:39:09 --:--:-- 0:00:00 0:00:00 -- date.job --
-
prol
: The frontend is being prepared for execution.
JID DM EM START END EXEC XFER EXIT NAME HOST
0 prol ---- 19:39:09 --:--:-- 0:00:00 0:00:00 -- date.job --
-
wrap pend
: The job has been successfully submitted to the frontend and it is pending in the queue
JID DM EM START END EXEC XFER EXIT NAME HOST
0 wrap pend 19:39:09 --:--:-- 0:00:00 0:00:00 -- date.job localhost/fork
-
wrap actv
: The job is running in the remote queue.
JID DM EM START END EXEC XFER EXIT NAME HOST
0 wrap actv 19:39:09 --:--:-- 0:00:05 0:00:00 -- date.job localhost/fork
-
epil
: The job is done/complete in queue and it's fetching the results.
JID DM EM START END EXEC XFER EXIT NAME HOST
0 epil ---- 19:39:09 --:--:-- 0:00:10 0:00:00 -- date.job localhost/fork
-
done
: The job is done.
JID DM EM START END EXEC XFER EXIT NAME HOST
0 done ---- 19:39:09 19:39:27 0:00:10 0:00:01 0 date.job localhost/fork
In this job template, the results from the job are the standard output (stdout) and standard error (stderr), both files will be in the same directory of the job submision:
[user@mycomputer~]$ cat stdout.0
Mon Jul 28 12:29:43 CEST 2014
[user@mycomputer~]$ cat stderr.0
Before starting, configure a public/private key pair for your ssh connection:
Generate a public/private key pair without password :
[user@mycomputer~]$ ssh-keygen -t rsa -b 2048 -f $HOME/.ssh/meteo_rsa -N ""
Copy the new public key to the TORQUE/PBS resource:
[user@mycomputer~]$ ssh-copy-id -i $HOME/.ssh/meteo_rsa.pub [email protected]
DRM4G uses the environment variable EDITOR
to select which editor is going to be used for configuring resources. By default the editor is nano
In order to configure a TORQUE/PBS cluster accessed through ssh protocol, you can follow the next steps:
Configure the meteo
resource:
[user@mycomputer~]$ drm4g resource edit
[DEFAULT]
enable = true
communicator = local
frontend = localhost
lrms = fork
[localmachine]
max_jobs_running = 1
[meteo]
enable = true
communicator = ssh
username = user
frontend = ui.macc.unican.es
private_key = ~/.ssh/meteo_rsa
lrms = pbs
queue = grid
max_jobs_running = 1
max_jobs_in_queue = 2
List and check if the resource has been created successfully:
[user@mycomputer~]$ drm4g resource list
RESOURCE STATE
localmachine enabled
meteo enabled
[user@mycomputer~]$ drm4g host list
HID ARCH JOBS(R/T) LRMS HOST
0 x86_64 0/0 fork localmachine
1 x86_64 0/0 pbs meteo
That's it! Now, you can summit jobs to both resources.
This section will describe how to take advantage of DRM4G
to calculate the number Pi. To do that, three types of jobs single, array and mpi will be used.
-
C binary : pi_serial
-
DRM4G job template:
EXECUTABLE = pi.sh
ARGUMENTS = 0 1 100000000
STDOUT_FILE = stdout_file.${JOB_ID}
STDERR_FILE = stderr_file.${JOB_ID}
INPUT_FILES = pi_serial, pi.sh
- pi.sh script:
#!/bin/bash
chmod +x ./pi_serial
./pi_serial $@
-
C binary : pi_serial
-
DRM4G job template:
EXECUTABLE = pi.sh
ARGUMENTS = ${TASK_ID} ${TOTAL_TASKS} 100000000
STDOUT_FILE = stdout_file.${TASK_ID}
STDERR_FILE = stderr_file.${TASK_ID}
INPUT_FILES = pi_serial, pi.sh
- pi.sh script:
#!/bin/bash
chmod +x ./pi_serial
./pi_serial $@
- Sum the results inside each file:
awk 'BEGIN {sum=0} {sum+=$1} END {printf "Pi is %0.12g\n", sum}' stdout_file.*
-
MPI C binary : pi_parallel
-
DRM4G job template:
EXECUTABLE = pi_mpi.sh
STDOUT_FILE = stdout.${JOB_ID}
STDERR_FILE = stderr.${JOB_ID}
INPUT_FILES = pi_mpi.sh, pi_parallel
NP = 2
- pimpi.sh script:
#!/bin/bash
source /software/meteo/use/load_use
use openmpi14intel
chmod +x pi_parallel
mpirun -np 2 ./pi_parallel