support dist.broadcast #7956

zpcore · 2024-09-05T01:53:05Z

Support torch.distributed.broadcast for both dynamo and nondynamo.

This PR needs pytorch/pytorch#135171 to be merged first.

will-cromar · 2024-09-05T17:33:49Z

torch_xla/csrc/cross_replica_reduces.cpp

+  XLATensorPtr xmask = bridge::GetXlaTensor(mask);
+  auto masked_input = tensor_methods::mul(xinput, xmask);
+  auto result = tensor_methods::all_reduce(masked_input, AllReduceType::kSum,
+                                           1.0, {}, true);


nit: name the non-obvious arguments at the end here. Assuming these two are scale and replica groups, /*scale=*/1, /*groups=*/{} (double check the names).

will-cromar · 2024-09-05T17:34:05Z

test/pjrt/test_collective_ops_tpu.py

@@ -139,7 +140,7 @@ def test_all_to_all(self, pin_layout):
                                             list(range(world_size))]])


-@absltest.skipIf(lambda: tpu.num_logical_cores_per_chip() >= 2,
+@absltest.skipIf(tpu.num_logical_cores_per_chip() >= 2,


🤦 thanks

will-cromar · 2024-09-05T17:35:07Z

torch_xla/_internal/c10d_registration.py

-
-
-# "broadcast(Tensor self, int src, str tag, int[] ranks, int group_size) -> Tensor",
-@torch.library.impl("_c10d_functional::broadcast", "XLA")


@JackCaoG FYI

will-cromar · 2024-09-05T17:36:16Z

torch_xla/csrc/cross_replica_reduces.cpp

+  at::Tensor mask;
+  const torch::lazy::BackendDevice& device = xinput->GetDevice();
+  if (device.ordinal() == src) {
+    mask = at::ones_like(input);


Is there an equivalent to torch.no_grad() in C++? That's the only difference I see between the original python version and this one

Searched the doc and we can use the following scope for tensor operation without grad:

{ at::NoGradGuard no_grad; // tensor operations }

Anyone knows why we set no grad here:

xla/torch_xla/_internal/c10d_registration.py

Line 17 in c0501f0

with torch.no_grad():

@JackCaoG

support dist.broadcast

1e30925

zpcore added usability Bugs/features related to improving the usability of PyTorch/XLA tpuci labels Sep 5, 2024

zpcore mentioned this pull request Sep 5, 2024

Dynamo support for collective broadcast op pytorch/pytorch#135171

Open

zpcore requested review from will-cromar and JackCaoG September 5, 2024 01:56

nit

7d931f6

zpcore marked this pull request as ready for review September 5, 2024 02:04

nit

21ff07d

will-cromar reviewed Sep 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support dist.broadcast #7956

support dist.broadcast #7956

zpcore commented Sep 5, 2024 •

edited

Loading

will-cromar Sep 5, 2024

will-cromar Sep 5, 2024

will-cromar Sep 5, 2024

will-cromar Sep 5, 2024

zpcore Sep 5, 2024

zpcore Sep 19, 2024 •

edited

Loading



		# "broadcast(Tensor self, int src, str tag, int[] ranks, int group_size) -> Tensor",
		@torch.library.impl("_c10d_functional::broadcast", "XLA")

support dist.broadcast #7956

Are you sure you want to change the base?

support dist.broadcast #7956

Conversation

zpcore commented Sep 5, 2024 • edited Loading

will-cromar Sep 5, 2024

Choose a reason for hiding this comment

will-cromar Sep 5, 2024

Choose a reason for hiding this comment

will-cromar Sep 5, 2024

Choose a reason for hiding this comment

will-cromar Sep 5, 2024

Choose a reason for hiding this comment

zpcore Sep 5, 2024

Choose a reason for hiding this comment

zpcore Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

zpcore commented Sep 5, 2024 •

edited

Loading

zpcore Sep 19, 2024 •

edited

Loading