Skip to content

Commit

Permalink
[PyTorch] Missing intra-domain ranks list when initializing Userbuffe…
Browse files Browse the repository at this point in the history
…rs with data parallelism (#1305)

added missing list of intra-domain ranks when num_domains > 1 in initialize_ub

Signed-off-by: Alp Dener <[email protected]>
  • Loading branch information
denera authored Nov 2, 2024
1 parent 4b8ffef commit a6a9141
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions transformer_engine/pytorch/module/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,7 @@ def initialize_ub(
ranks_per_domain_list, backend=bootstrap_backend
)
local_rank = torch.distributed.get_rank(intra_domain_group)
intra_domain_ranks = torch.distributed.get_process_group_ranks(intra_domain_group)

inter_domain_group, _ = torch.distributed.new_subgroups_by_enumeration(
[list(ranks) for ranks in zip(*ranks_per_domain_list)],
Expand Down

0 comments on commit a6a9141

Please sign in to comment.