This repo contains a variety of tutorials for using the PiPPy pipeline parallelism library with accelerate. You will find examples covering:
- How to trace the model using
accelerate.prepare_pippy
- How to specify inputs based on what the model expects (when to use
kwargs
,args
, and such) - How to gather the results at the end.
This requires the main
branch of accelerate (or a version at least 0.27.0) and pippy
version of 0.2.0 or greater. Please install using pip install .
to pull from the setup.py
in this repo, or run manually:
pip install 'accelerate>=0.27.0' 'torchpippy>=0.2.0'
One can expect that PiPPy will outperform native model parallism by a multiplicative factor since all GPUs are running at all times with inputs, rather than one input being passed through a GPU at a time waiting for the prior to finish.
Below are some benchmarks we have found when using the accelerate-pippy integration for a few models when running on 2x4090's:
Accelerate/Sequential | PiPPy + Accelerate | |
---|---|---|
First batch | 0.2137s | 0.3119s |
Average of 5 batches | 0.0099s | 0.0062s |
Accelerate/Sequential | PiPPy + Accelerate | |
---|---|---|
First batch | 0.1959s | 0.4189s |
Average of 5 batches | 0.0205s | 0.0126s |
Accelerate/Sequential | PiPPy + Accelerate | |
---|---|---|
First batch | 0.2789s | 0.3809s |
Average of 5 batches | 0.0198s | 0.0166s |