parallelization torch distributed #57
-
first of all, thanks for the great work! I'm looking into integrating it into our ML training pipeline. When looking into parallelization, you hinted that integration into torch.distrubuted is straight forward. In my setup, we run distributed training using multiple data loader on a copy of same model (standard ddp), where each dataloader only feeds part of the dataset to the model during training. The parallelization in your example however, looks like parallelizing over multiple checkpoints/models instead. Do I understand correctly that this is the only supported way for parallelization? much appreciated! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
I see the desired approach is to parallelize by checkpoints. Using multi data loader is possible with the global indices during featurize and scoring. |
Beta Was this translation helpful? Give feedback.
-
Hi @JD-ETH. Indeed, we've mostly been doing parallelization by checkpoints and this is what we include in all examples. It should be fairly straightforward to parallelize over dataloaders. Using DDP, I would just make the data indices part of the batch, and then you can run The above solution should work out of the box. If it indeed does work, an example outlining how to use DDP will be greatly appreciated :) |
Beta Was this translation helpful? Give feedback.
Hi @JD-ETH. Indeed, we've mostly been doing parallelization by checkpoints and this is what we include in all examples. It should be fairly straightforward to parallelize over dataloaders. Using DDP, I would just make the data indices part of the batch, and then you can run
featurize
andscore
with the optionalinds
argument instead ofnum_samples
(check, e..g, https://trak.readthedocs.io/en/latest/trak.html#trak.traker.TRAKer.featurize). I believe this is the same approach as the "global indices" you suggest :)The above solution should work out of the box. If it indeed does work, an example outlining how to use DDP will be greatly appreciated :)