gloo/cuda: use torch dtype bf16 #441

d4l3k · 2025-05-13T21:27:28Z

This adds support for using torch dtypes in CUDA kernels when building PyTorch.

Test plan:

import os
import time

transport = "TCP"
#transport = "IBVERBS"

os.environ["GLOO_DEVICE_TRANSPORT"] = transport
rank = int(os.environ["RANK"])
os.environ["CUDA_VISIBLE_DEVICES"] = str(rank)

ibv = "mlx5_0:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_9:1,mlx5_10:1,mlx5_11:1".split(",")[rank]
ibv_name, ibv_port = ibv.split(":")
os.environ["TORCH_GLOO_IBV_NAME"] = ibv_name
os.environ["TORCH_GLOO_IBV_PORT"] = ibv_port
os.environ["TORCH_GLOO_IBV_INDEX"] = "3"

import torch
import torch.distributed as dist

dist.init_process_group("gloo")

rank = dist.get_rank()

# initial sanity check
#device = "cpu"
#t = torch.zeros(10, device=device)
#dist.all_reduce(t)
#print("sanity complete")

device = "cpu"

iters = 10
warmup_iters = 2

for nelem in [10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000]:
    t = torch.zeros(nelem, device=device)

    torch.cuda.current_stream().synchronize()
    for i in range(warmup_iters):
        dist.all_reduce(t)

    torch.cuda.current_stream().synchronize()

    start = time.perf_counter()

    for i in range(iters):
        dist.all_reduce(t)

    torch.cuda.current_stream().synchronize()

    dur = (time.perf_counter() - start)
    qps = iters/dur

    bandwidth_gb = t.nbytes * iters / dur / 1e9

    gb = t.nbytes / 1e9

    if rank == 0:
        print(f"{transport=} {device=} {iters=} {nelem=} {qps=} {gb=} {bandwidth_gb=}\n", end="")

d4l3k requested a review from fduwjj May 13, 2025 21:27

facebook-github-bot added the CLA Signed label May 13, 2025

d4l3k mentioned this pull request May 13, 2025

gloo: cuda pytorch/pytorch#153406

Open

gloo/cuda: use torch dtype bf16

80cc076

d4l3k force-pushed the d4l3k/torch_dtypes branch from fe15276 to 80cc076 Compare May 13, 2025 23:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gloo/cuda: use torch dtype bf16 #441

gloo/cuda: use torch dtype bf16 #441

d4l3k commented May 13, 2025 •

edited

Loading

gloo/cuda: use torch dtype bf16 #441

Are you sure you want to change the base?

gloo/cuda: use torch dtype bf16 #441

Conversation

d4l3k commented May 13, 2025 • edited Loading

d4l3k commented May 13, 2025 •

edited

Loading