Investigate tf32 support for AMD GPUs #18
Labels
backend
Updates related to backend
enhancement
New feature or request
on hold
If there is a blocker for solving an issue but there will be a solution for the blocker
See whether tf32 support can be extended to AMD GPUs. This can be setup via
torch.set_float32_matmul_precision("high")
and is currently enabled for NVIDIA GPUs in ts.torch_handler.base_handler.As of 5.11.2024, TF32 support is not yet integrated with torch, even though some AMD GPUs support it (at least MI300).
“Compared to MI250X accelerators, CDNA 3 Matrix Cores triple the performance for FP16 and BF16, while providing a performance gain of 6.8 times for INT8. FP8 has a performance gain of 16 times compared to FP32, while TF32 has a gain of 4 times compared to FP32.”
AMD Instinct™ MI300 series microarchitecture — ROCm Documentation
Some notes from Jack:
”Compute capability here will map to gcnArch so gfx942 will return 9,4 may need a seperate check for us. Upstream use this a lot and most of the time it doesnt cause issues for us but may need a seperate check in your case.”
”AMD GPUs support TF32, but there is no library support at AMD. The only thing possible is to implicitly convert TF32 to bfloat16 and cast results back to FP32.”
“it looks like setting this to high will either use TF32 (not an option for us) or treat each FP32 input as the sum of two bf16 numbers. We must be going down the second path here.”
The text was updated successfully, but these errors were encountered: