Skip to content

A small example of using GCP horovod and PyTorch for dist training

License

Notifications You must be signed in to change notification settings

onesamblack/distributed-torch-horovod-gcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

distributed-torch-horovod-gcp

A small example of using GCP horovod and PyTorch for dist training

This is an accompaniment to the articles here:

  • High Performance Distributed Deep Learning with multiple GPUs on Google Cloud Platform — Part 1 link
  • High Performance Distributed Deep Learning with multiple GPUs on Google Cloud Platform — Part 2 link

Usage

To run the training script

Step 1: install requirements

pip3 install -r requirements.txt

Step 2: run the script. This will run on only 1 GPU if available. you will need at least 1 GPU to run this experiment

python3 app/torch_train.py

To run with Horovod (4 GPUs), follow step 1 above and then run:

horovodrun -np 4 -H localhost:4 python3 app/torch_train.py 

Building the Image

To build the image, cd into the repository and run

docker build .

About

A small example of using GCP horovod and PyTorch for dist training

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published