Using NCAR's Derecho #3669
Replies: 4 comments
-
@loganpknudsen has already compiled a bit of useful information over at #3655. I think that @tomchor and @simone-silvestri are also using Derecho. |
Beta Was this translation helpful? Give feedback.
-
Want to put this here:
|
Beta Was this translation helpful? Give feedback.
-
It could make sense to build a package that codifies and even automates the process of setting up julia and Oceananigans on Derecho. What do others think about that? |
Beta Was this translation helpful? Give feedback.
-
Thanks, @glwagner, this is super useful. |
Beta Was this translation helpful? Give feedback.
-
Overview
NCAR's Derecho supercomputer is housed at the NCAR-Wyoming Supercomputing Center.
Derecho has 82 nodes that each have 128 AMD Milan cores and 4 NVIDIA A100 GPUs. Derecho uses a PBS queuing system.
Note, this post is subject to change. Let's try to keep it up to date, please comment below if something does not work.
Scope
This discussion can cover anything to do with trying to get results from running Oceananigans on Derecho --- including installing Julia, setting up CUDA and MPI, configuring PBS scripts, and using other Julia packages in conjunction with Oceananigans.
Links
Getting started on Derecho with CUDA-Aware MPI
The first task is to download Julia. I opted to manually install a binary in
~/software
. Note that I have not tested the following workflow withjuliaup
or usingjulia 1.11.1
--- this is a work in progress, so stay tuned. For julia1.10.6
a binary for derecho can be downloaded by typingjulia
can now be launched by typing~/software/julia-1.10.6/bin/julia
. I added the directory withjulia
to my path by putting the following in~/.bash_profile
:I also changed my depot path over to
/glade/work/$USER/.julia
(by default the depot would reside in$HOME/.julia)
, which doesn't have as much storage capacity):Moving the depot into
work
helps when software downloads big data sets into the depot (like ClimaOcean does).An example program run with PBS
Next let's test that things work by creating a test project:
Next we write some test code that will exercise CUDA-aware MPI communication (the hard thing to get set up):
I put this into a file called
~/TestInterpolate/test_interpolate.jl
. Now we're ready to write a bash script that can be submitted to the queue using PBS'sqsub
. Here's one possible incarnation of such a script:Note that you have to put the ID for YOUR Derecho allocation in the above script where it says
<YOUR ACCOUNT ID>
. (You probably have an email where this is given. I'm trying to figure out how to get it using a PBS command. If you put in a wrong id, you might get an error message that tells you your account ID prefixed byqsub: Invalid account for GPU usage, available accounts:
.) I copied the above script into a file calledrun_derecho_job.sh
. Then I submitted it:qsub run_derecho_job.sh # output should be something like 6486127.desched1
The current status of the job can be found by typing
qstat -u $USER
. You can also monitor its progress using the commandwatch -n 0.1 qstat -u $USER
Press
ctrl-c
to exitwatch
. The output will be piped into a file calledtest.o*******
, where the*
are replaced by numbers that represent the job ID.The first time you launch the job,
test.o*******
will contain a lot of information about precompilation. It may also containCUDA
errors regardingLD_LIBRARY_PATH
(we are trying to figure out if these are an issue or not). The essential part of the output should be at the end of the file and should look something likeBeta Was this translation helpful? Give feedback.
All reactions