Skip to content

MOOSE on Raijin@NCI

jacklinzoho edited this page Mar 31, 2017 · 35 revisions

Raijin is a very large Linux cluster owned by NCI. It runs PBS. Hardware is mostly Sandy Bridge, with some Broadwell-E. It's much bigger and faster than Leonardi. There are some specialized nodes with GPU acceleration or very large memory.

Getting started

If you just want a working Moose installation (rather than compile your own), simply copy /short/ws55/.profile into your home folder. You can do this using Filezilla or console commands. Also there is a job template /short/ws55/template.pbs

Folders and Permissions

###Home directory You can find your home directory with

cd ~
pwd

###Shared folder

The home directory has a tiny 2GB quota. However, there is a shared drive under /short/ws55 with 72GB quota, shared amongst everyone in our project.

Under /short/ws55/ you will find a folder with each username, including yours. You can read and write to your folder, and to the base /short/ws55 folder. You have read (but not write) access to the folders of other users.

There is also a 1TB massdata quota, which requires special commands and is only to be used to archive large data files (not very useful) https://opus.nci.org.au/display/Help/MASSDATA+User+Guide.

Queues, Commands

For each Sandy Bridge node, we have 2(cpus) x 8(cores) = 16 threads (hyperthreading is disabled by default). The following are the most common queues:

normal: largest number of requestable CPUs.  Processor-hours cost 1 credit each.
express: shorter wait time.  Processor-hours cost 3 credits each.
copyq: the only queue that has internet access.  1 CPU only and walltime <= 10 hours.  
       1 credit/processor-hour.  Probably easier to just use the interactive session.

There are other specialized queues; see https://opus.nci.org.au/display/Help/Raijin+User+Guide

Use "nf_limits -P ws55 -n (number_of_cpus) -q (queue)" to determine how much memory you can request.

Handy commands:

qsub <job_file> : submit job
qstat -u <username> : show job stats.
qdel <job_number> : delete job from queue
lquota : see disk quotas
nci_account : see group hours usage
nf_limits -P ws55 -n 512 -q normal: see memory and walltime limits for our project on the normal queue, if we request 512 CPUs.

Petsc

Raijin's Petsc module lacks Hypre; Libmesh complains about this and doesn't compile. I've built a working installation in /short/ws55/jl1719/opt/moose/petsc-3.7.4. Note that the entire toolchain (Petsc -> Libmesh -> Moose -> Redback) needs to be built using the same compilers (i.e. with the same modules loaded) or you'll get many strange bugs.

The modules are:

module load gcc/5.2.0
module load llvm/3.9.1
module load vtk/6.3.0
module load cmake/3.6.2
module load intel-mpi/2017.0.098

petsc was configured using the following options:

./configure \
--prefix=$PACKAGES_DIR/petsc-3.7.4 \
--download-hypre=1 \
--with-ssl=0 \
--with-debugging=no \
--with-pic=1 \
--with-shared-libraries=1 \
--with-cc=mpicc \
--with-cxx=mpicxx \
--with-fc=mpif90 \
--download-fblaslapack=1 \
--download-metis=1 \
--download-parmetis=1 \
--download-superlu_dist=1 \
--download-mumps=1 \
--download-scalapack=1 \
--CC=mpicc --CXX=mpicxx --FC=mpif90 --F77=mpif77 --F90=mpif90 \
--CFLAGS='-fPIC -fopenmp' \
--CXXFLAGS='-fPIC -fopenmp' \
--FFLAGS='-fPIC -fopenmp' \
--FCFLAGS='-fPIC -fopenmp' \
--F90FLAGS='-fPIC -fopenmp' \
--F77FLAGS='-fPIC -fopenmp' \
--LDFLAGS='-L/apps/gcc/5.2.0/lib64'

Include this version of Petsc in your scripts with:

export PACKAGES_DIR="/short/ws55/jl1719/opt/moose"
export PETSC_DIR="$PACKAGES_DIR/petsc-3.7.4"

Anaconda

I've also built a shared Anaconda 2.7 installation. Add this to your .profile (not .bashrc):

export PATH="/short/ws55/jl1719/opt/anaconda2/bin:$PATH"

remember to run

dos2unix .profile

Anaconda is ~2GB. Considering our limited space, it's better than having separate installs.

Python, MPI and multi-node jobs

If you use Python to run multiple MPI tasks inside a PBS job that requests multiple nodes, by default all the tasks end up in one node while the rest sit idle.

According to NCI tech support: The easiest way to make it spread out evenly is to specify the host to launch on to mpirun. You can get the list of hosts for your job by looking at the file specified in the PBS_NODEFILE environment variable, and then pass each in turn via mpirun --hosts .

Something like this:

#PBS_NODEFILE contains one line per core, not per node, so dedupe using a set instead of list
hosts = set()
with open(os.getenv("PBS_NODEFILE"), "r") as f:
  for line in f:
    hostname = line.strip()
    hosts.add(hostname)
for host in hosts:
  command_string = "mpirun --hosts " + host + " --np 16 redback_opt -i "+ jobfilenames.pop()
  os.system(command_string)