ssh

ssh [email protected]
ssh -X [email protected] # for gui
ssh -p 4422 [email protected] # connecting from outside iitkgp

Directory Structure

/home/username
- 40GB storage and 50GB hard limit
- has backup
- use it to store important files, outputs, logs, etc.
- don't store datasets here. don't submit jobs from here.
/scratch/username
- 2TB storage
- no backup
- use it to store datasets, code, etc.
- submit jobs from here
- export job outputs to /home/username if needed

Installing conda

Easy just follow the instructions here

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh

~/miniconda3/bin/conda init bash
~/miniconda3/bin/conda init zsh

Installing linux packages

Since we require sudo to use the default package manager, yum, we will install packages to our home directory and add the binaries to our path.

Create a directory to store the packages and downloaded .rpm files

mkdir -p ~/centos # for installed packages
mkdir -p ~/rpm  # for downloading .rpm files

Add the following to your .bashrc or .zshrc

export PATH="$HOME/centos/usr/sbin:$HOME/centos/usr/bin:$HOME/centos/bin:$PATH"
export MANPATH="$HOME/centos/usr/share/man:$MANPATH"
L='/lib:/lib64:/usr/lib:/usr/lib64'
export LD_LIBRARY_PATH="$L:$HOME/centos/usr/lib:$HOME/centos/usr/lib64"

now download .rpm using yumdownloader --destdir ~/rpm --resolve <package_name> and install using rpm2cpio <package_name>.rpm | cpio -D ~/centos -idmv
you can use the script install_all.sh to install all the packages in the rpm directory
or you can run python3 install_rpm.py <package_name> to install a single package. find the python script here

Modules

Use module avail to see all available modules
Use module load <module_name> to load a module
Latest cuda version installed is 11.7 so don't just pip install torch. You'll have to compile torch with correct cuda version. Use module load compiler/cuda/11.7 in your job script before submitting the job on gpu nodes. see: https://pytorch.org/get-started/previous-versions/

eg for cuda 11.7 :

conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
# or use pip
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2

Find the available modules here (as of Jan 2024)

Attaching another terminal to a running job

You can use tmux with sessions or

srun --overlap --pty --jobid <jobid> /bin/bash

Getting output from already running shell session in a job

sattach <jobnum>.<num> # change the num to 0, 1, ...
# example: sattach 1167246.0

Installing Latex

First download and extract the requried package

wget https://mirror.ctan.org/systems/texlive/tlnet/install-tl-unx.tar.gz --no-check-certificate
zcat < install-tl-unx.tar.gz | tar xf -
cd install-tl-*

Create appropriate directories for the installation

mkdir -p ~/centos/usr/texlive/2024

Run and installer but change the directories

perl ./install-tl
# follow the instructions on the terminal to first change the installation directory from /usr/.. to ~/centos/usr...
# Then return to main menu and continue installation

Add the installed binary location (~/centos/usr/texlive/2024/bin/x86_64-linux/) to your PATH

Installing MuJoCo

pip3 install -U 'mujoco-py<2.2,>=2.1' numpy scipy quaternion numpy-quaternion mujoco

mkdir ~/.mujoco && cd ~/.mujoco
wget https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz
tar -xf mujoco210-linux-x86_64.tar.gz
rm mujoco210-linux-x86_64.tar.gz

add the following to your .bashrc or .zshrc

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco210/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia

Install any missing dependencies using the script install_rpm.py

some basic slurm commands

sbatch <job_script> to submit a job
squeue to see jobs
scancel <job_id> to cancel a job
sinfo to see nodes
sinfo -s to see nodes in a table

jupyter

make sure your environment has jupyter
submit an interactive bash job by running srun -p gpu --time=<H>:<MM>:<SS> --gres=gpu:<num_gpus> --pty bash
activate your environment
(optional) use screen to (detachably) multiplex the shell
run hostname -i and note down your gpu node's IP, say as ip (if you don't know it already)
run jupyter notebook --port XXXX --no-browser
copy one of the full links (after Jupyter Server <VER> is running at:), e.g. http://localhost:<PORT>/tree?token=<TOKEN>
many ports are blocked so note down which port (<PORT> above) the jupyter kernel is actually listening on
on your local machine, in a new shell make a tunnel by running ssh -t -t <USER>@paramshakti.iitkgp.ac.in -L localhost:<PORT>:localhost:<PORT> ssh <USER>@<ip> -L localhost:<PORT>:localhost:<PORT>
open the link you copied in step 7 in a browser on your local machine

wandb

GPU nodes do not have access to the internet
Set wandb to offline mode using

    export WANDB_MODE=offline # on shell

    os.environ["WANDB_MODE"] = "offline" # in jupyter or inside a script
    wandb.init( ...,  mode="offline")

kinda incomplete i'll update it as i learn more :p

Reference: http://www.hpc.iitkgp.ac.in/HPCF/paramShakti

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
install_all.sh		install_all.sh
install_rpm.py		install_rpm.py
module_avail_jan_2024.txt		module_avail_jan_2024.txt
sample_job.sh		sample_job.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ssh

Directory Structure

Installing conda

Installing linux packages

Modules

Attaching another terminal to a running job

Getting output from already running shell session in a job

Installing Latex

Installing MuJoCo

some basic slurm commands

jupyter

wandb

About

Releases

Packages

Contributors 2

Languages

yashsirvi/HPC-Notes

Folders and files

Latest commit

History

Repository files navigation

ssh

Directory Structure

Installing conda

Installing linux packages

Modules

Attaching another terminal to a running job

Getting output from already running shell session in a job

Installing Latex

Installing MuJoCo

some basic slurm commands

jupyter

wandb

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages