Update comments in data parallel example to use sampler #7914

JackCaoG · 2024-08-26T18:47:28Z

will-cromar · 2024-08-27T18:09:26Z

examples/data_parallel/train_resnet_ddp.py

+    if xr.world_size() > 1:
+      train_sampler = torch.utils.data.distributed.DistributedSampler(
+          train_dataset,
+          num_replicas=xr.world_size(),


nit: dist.world_size

will-cromar · 2024-08-27T18:09:34Z

examples/data_parallel/train_resnet_ddp.py

+      train_sampler = torch.utils.data.distributed.DistributedSampler(
+          train_dataset,
+          num_replicas=xr.world_size(),
+          rank=xr.global_ordinal(),


nit: dist.get_rank

will-cromar · 2024-08-27T18:09:45Z

examples/data_parallel/train_resnet_ddp.py

+    # want each process to handle different parts of the data.
+    '''
+    train_sampler = None
+    if xr.world_size() > 1:


also dist.world_size

will-cromar · 2024-08-27T18:10:35Z

examples/data_parallel/train_resnet_ddp.py

+    # below code is commented out because in this example we used a fake data
+    # loader that does not take sampler. However this logic is needed if you
+    # want each process to handle different parts of the data.
+    '''


Is it clearer to just apply this sampler to the fake dataset anyway?

fake dataset is a XLA util https://github.com/pytorch/xla/blob/master/examples/train_resnet_base.py#L25-L29, it does not take sampler.

Oh, I just kind of assumed that SampleGenerator was an idiomatic Dataset. Can you try just making it inherit from IterableDataset since it has __iter__ and __len__ already? You should then be able to wrap it in a standard sampler and data loader.

If that doesn't work, then one of us can follow up. Our examples should be as close to PyTorch as possible.

JackCaoG requested a review from will-cromar August 26, 2024 20:10

JackCaoG marked this pull request as ready for review August 26, 2024 20:10

will-cromar reviewed Aug 27, 2024

View reviewed changes

JackCaoG added 3 commits September 17, 2024 20:53

Update comments in data parallel example to use sampler

87a50b1

Update SampleGenerator

3539e6e

this will crash after the epoch end and I am not sure why...

669e59f

JackCaoG force-pushed the JackCaoG/update_mp_example branch from 852eb2b to 669e59f Compare September 24, 2024 00:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update comments in data parallel example to use sampler #7914

Update comments in data parallel example to use sampler #7914

JackCaoG commented Aug 26, 2024 •

edited

Loading

will-cromar Aug 27, 2024

will-cromar Aug 27, 2024

will-cromar Aug 27, 2024

will-cromar Aug 27, 2024

JackCaoG Aug 27, 2024

will-cromar Aug 27, 2024

Update comments in data parallel example to use sampler #7914

Are you sure you want to change the base?

Update comments in data parallel example to use sampler #7914

Conversation

JackCaoG commented Aug 26, 2024 • edited Loading

will-cromar Aug 27, 2024

Choose a reason for hiding this comment

will-cromar Aug 27, 2024

Choose a reason for hiding this comment

will-cromar Aug 27, 2024

Choose a reason for hiding this comment

will-cromar Aug 27, 2024

Choose a reason for hiding this comment

JackCaoG Aug 27, 2024

Choose a reason for hiding this comment

will-cromar Aug 27, 2024

Choose a reason for hiding this comment

JackCaoG commented Aug 26, 2024 •

edited

Loading