-
Notifications
You must be signed in to change notification settings - Fork 21
MOOSE on Raijin@NCI
Raijin is a very large Linux cluster owned by NCI. It runs PBS. Hardware is mostly Sandy Bridge, with some Broadwell-E. It's much bigger and faster than Leonardi. There are some specialized nodes with GPU acceleration or very large memory.
If you just want a working Moose installation (rather than compile your own), simply copy /short/ws55/.profile into your home folder. You can do this using Filezilla or console commands. Also there is a job template /short/ws55/template.pbs
###Home directory You can find your home directory with
cd ~
pwd
###Shared folder
The home directory has a tiny 2GB quota. However, there is a shared drive under /short/ws55 with 72GB quota, shared amongst everyone in our project.
Under /short/ws55/ you will find a folder with each username, including yours. You can read and write to your folder, and to the base /short/ws55 folder. You have read (but not write) access to the folders of other users.
There is also a 1TB massdata quota, which requires special commands and is only to be used to archive large data files (not very useful) https://opus.nci.org.au/display/Help/MASSDATA+User+Guide.
For each Sandy Bridge node, we have 2(cpus) x 8(cores) = 16 threads (hyperthreading is disabled by default). The following are the most common queues:
normal: largest number of requestable CPUs. Processor-hours cost 1 credit each.
express: shorter wait time. Processor-hours cost 3 credits each.
copyq: the only queue that has internet access. 1 CPU only and walltime <= 10 hours.
1 credit/processor-hour. Probably easier to just use the interactive session.
There are other specialized queues; see https://opus.nci.org.au/display/Help/Raijin+User+Guide
Use "nf_limits -P ws55 -n <number_of_cpus> -q " to determine how much memory you can request.
Handy commands:
qsub <job_file> : submit job
qstat -u <username> : show job stats.
qdel <job_number> : delete job from queue
lquota : see disk quotas
nci_account : see group hours usage
nf_limits -P ws55 -n 512 -q normal: see memory and walltime limits for our project on the normal queue, if we request 512 CPUs.
Raijin's Petsc module lacks Hypre; Libmesh complains about this and doesn't compile. I've built a working installation in /short/ws55/jl1719/opt/moose/petsc-3.7.4. Note that the entire toolchain (Petsc -> Libmesh -> Moose -> Redback) needs to be built using the same compilers (i.e. with the same modules loaded) or you'll get many strange bugs.
The modules are:
module load gcc/5.2.0
module load llvm/3.9.1
module load vtk/6.3.0
module load cmake/3.6.2
module load intel-mpi/2017.0.098
petsc was configured using the following options:
./configure \
--prefix=$PACKAGES_DIR/petsc-3.7.4 \
--download-hypre=1 \
--with-ssl=0 \
--with-debugging=no \
--with-pic=1 \
--with-shared-libraries=1 \
--with-cc=mpicc \
--with-cxx=mpicxx \
--with-fc=mpif90 \
--download-fblaslapack=1 \
--download-metis=1 \
--download-parmetis=1 \
--download-superlu_dist=1 \
--download-mumps=1 \
--download-scalapack=1 \
--CC=mpicc --CXX=mpicxx --FC=mpif90 --F77=mpif77 --F90=mpif90 \
--CFLAGS='-fPIC -fopenmp' \
--CXXFLAGS='-fPIC -fopenmp' \
--FFLAGS='-fPIC -fopenmp' \
--FCFLAGS='-fPIC -fopenmp' \
--F90FLAGS='-fPIC -fopenmp' \
--F77FLAGS='-fPIC -fopenmp' \
--LDFLAGS='-L/apps/gcc/5.2.0/lib64'
Include this version of Petsc in your scripts with:
export PACKAGES_DIR="/short/ws55/jl1719/opt/moose"
export PETSC_DIR="$PACKAGES_DIR/petsc-3.7.4"
I've also built a shared Anaconda 2.7 installation. Add this to your .profile (not .bashrc):
export PATH="/short/ws55/jl1719/opt/anaconda2/bin:$PATH"
remember to run
dos2unix .profile
Anaconda is ~2GB. Considering our limited space, it's better than having separate installs.
REDBACK: Rock mEchanics with Dissipative feedBACKs