Investigate tf32 support for AMD GPUs #18

eppane · 2024-11-13T10:37:21Z

See whether tf32 support can be extended to AMD GPUs. This can be setup via torch.set_float32_matmul_precision("high") and is currently enabled for NVIDIA GPUs in ts.torch_handler.base_handler.

As of 5.11.2024, TF32 support is not yet integrated with torch, even though some AMD GPUs support it (at least MI300).

“Compared to MI250X accelerators, CDNA 3 Matrix Cores triple the performance for FP16 and BF16, while providing a performance gain of 6.8 times for INT8. FP8 has a performance gain of 16 times compared to FP32, while TF32 has a gain of 4 times compared to FP32.”

AMD Instinct™ MI300 series microarchitecture — ROCm Documentation

Some notes from Jack:

”Compute capability here will map to gcnArch so gfx942 will return 9,4 may need a seperate check for us. Upstream use this a lot and most of the time it doesnt cause issues for us but may need a seperate check in your case.”

”AMD GPUs support TF32, but there is no library support at AMD. The only thing possible is to implicitly convert TF32 to bfloat16 and cast results back to FP32.”

“it looks like setting this to high will either use TF32 (not an option for us) or treat each FP32 input as the sum of two bf16 numbers. We must be going down the second path here.”

The text was updated successfully, but these errors were encountered:

eppane added enhancement New feature or request backend Updates related to backend on hold If there is a blocker for solving an issue but there will be a solution for the blocker labels Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate tf32 support for AMD GPUs #18

Investigate tf32 support for AMD GPUs #18

eppane commented Nov 13, 2024

Investigate tf32 support for AMD GPUs #18

Investigate tf32 support for AMD GPUs #18

Comments

eppane commented Nov 13, 2024