parallelization torch distributed #57

JD-ETH · 2023-12-22T03:05:57Z

JD-ETH
Dec 22, 2023

first of all, thanks for the great work! I'm looking into integrating it into our ML training pipeline.

When looking into parallelization, you hinted that integration into torch.distrubuted is straight forward. In my setup, we run distributed training using multiple data loader on a copy of same model (standard ddp), where each dataloader only feeds part of the dataset to the model during training. The parallelization in your example however, looks like parallelizing over multiple checkpoints/models instead. Do I understand correctly that this is the only supported way for parallelization?

much appreciated!

Answered by kristian-georgiev

Jan 5, 2024

Hi @JD-ETH. Indeed, we've mostly been doing parallelization by checkpoints and this is what we include in all examples. It should be fairly straightforward to parallelize over dataloaders. Using DDP, I would just make the data indices part of the batch, and then you can run featurize and score with the optional inds argument instead of num_samples (check, e..g, https://trak.readthedocs.io/en/latest/trak.html#trak.traker.TRAKer.featurize). I believe this is the same approach as the "global indices" you suggest :)

The above solution should work out of the box. If it indeed does work, an example outlining how to use DDP will be greatly appreciated :)

View full answer

JD-ETH · 2024-01-05T04:31:15Z

JD-ETH
Jan 5, 2024
Author

I see the desired approach is to parallelize by checkpoints. Using multi data loader is possible with the global indices during featurize and scoring.

0 replies

kristian-georgiev · 2024-01-05T09:24:43Z

kristian-georgiev
Jan 5, 2024
Maintainer

Hi @JD-ETH. Indeed, we've mostly been doing parallelization by checkpoints and this is what we include in all examples. It should be fairly straightforward to parallelize over dataloaders. Using DDP, I would just make the data indices part of the batch, and then you can run featurize and score with the optional inds argument instead of num_samples (check, e..g, https://trak.readthedocs.io/en/latest/trak.html#trak.traker.TRAKer.featurize). I believe this is the same approach as the "global indices" you suggest :)

The above solution should work out of the box. If it indeed does work, an example outlining how to use DDP will be greatly appreciated :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parallelization torch distributed #57

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

parallelization torch distributed #57

JD-ETH Dec 22, 2023

Replies: 2 comments

JD-ETH Jan 5, 2024 Author

kristian-georgiev Jan 5, 2024 Maintainer

JD-ETH
Dec 22, 2023

JD-ETH
Jan 5, 2024
Author

kristian-georgiev
Jan 5, 2024
Maintainer