Running Megatron-DeepSpeed with Slurm

Original Megaton-DeepSpeed is implemented in:

Megatron-DeepSpeed from Microsoft: https://github.com/microsoft/Megatron-DeepSpeed
Megatron-DeepSpeed from BigScience: https://github.com/bigscience-workshop/Megatron-DeepSpeed

where sample slurm scripts in this repository are referring.

3D Parallelism Enabled

Data Parallelism(DP)
- DeepSpeed ZeRO-DP stage 1
Pipeline Parallelism(PP)
- DeepSpeed pipeline parallelism
Tensor Parallelism(TP)
- Megatron-LM tensor slicing

When ZeRO-DP is combined with PP and TP, it typically enables ZeRO stage 1. Though it's techinically possible to use ZeRO-DP stage 2 with PP(optionally TP), it would cause performance degradation due to additional reduce-scatter collective communication for every micro-batch to aggregate the gradients before sharding. The same reason is applied to the case of ZeRO-DP stage 3. That's why I guess ZeRO-DP stage 2 or 3 is NOT allowed to use PP by default in the original implementation repos.

Prerequisite

Prepare for a docker container with DeepSpeed installed. The simplest form of Dockerfile would be like:

FROM nvcr.io/nvidia/pytorch:22.02-py3
RUN apt-get update
RUN pip install deepspeed

Build the docker container from the Dockerfile with your own tag name.

docker build --tag $tagname .

Clone Megatron-DeepSpeed implmentation repo

git clone https://github.com/microsoft/Megatron-DeepSpeed.git
cd Megatron-DeepSpeed

Prepare for sample dataset

wget https://huggingface.co/bigscience/misc-test-data/resolve/main/stas/oscar-1GB.jsonl.xz
wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-vocab.json
wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-merges.txt
xz -d oscar-1GB.jsonl.xz
python tools/preprocess_data.py \
    --input oscar-1GB.jsonl \
    --output-prefix my-gpt2 \
    --vocab gpt2-vocab.json \
    --dataset-impl mmap \
    --tokenizer-type GPT2BPETokenizer \
    --merge-file gpt2-merges.txt \
    --append-eod \
    --workers 8

There might be no dependency errors if you try going through this step inside the PyTorch Container with DeepSpeed installed(from Prerequisite step 1-2).

Run Megatron-DeepSpeed with Slurm

Slurm scheduler is used to dispatch jobs to the GPU computing cluster. Note that the value of variable CONTAINER_IMAGE in the slurm scripts should be modified to the tag name of your own container where DeepSpeed is properly installed(see Prerequisite step 1-2). In addition, most of the configuration parameters in the scripts are hard-coded just for simplicity. You can modify them according to your preference(e.g., the size of TP/PP, hidden states, batchsize, DeepSpeed configs, etc).

Megatron-DeepSpeed on a single node

The default number of GPUs is set to be 8(i.e., 8 GPUs) in the script by the variable N_GPUS. you can modify the value of this variable if you need. Execute the script with sbatch command.

sbatch megatron_ds_snmg.slurm

The value of the variable N_GPUS shouldn't be over the number of physical GPU devices in a single node.

Megatron-DeepSpeed on multi-nodes

The default number of nodes is set to be 2 with 8 GPUs for each. the number of nodes can be modified using the sbatch argument when executing. If you try using 4 nodes(32 GPUs) for training,

sbatch --nodes 4 megatron_ds_mnmg.slurm

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
megatron_ds_mnmg.slurm		megatron_ds_mnmg.slurm
megatron_ds_snmg.slurm		megatron_ds_snmg.slurm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Running Megatron-DeepSpeed with Slurm

3D Parallelism Enabled

Prerequisite

Run Megatron-DeepSpeed with Slurm

Megatron-DeepSpeed on a single node

Megatron-DeepSpeed on multi-nodes

About

Releases

Packages

Languages

woojinsoh/Megatron-DeepSpeed-Slurm

Folders and files

Latest commit

History

Repository files navigation

Running Megatron-DeepSpeed with Slurm

3D Parallelism Enabled

Prerequisite

Run Megatron-DeepSpeed with Slurm

Megatron-DeepSpeed on a single node

Megatron-DeepSpeed on multi-nodes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages