Fix RCCL install, linear.py logic, CMake custom extension, update requirement for FP8 compute #42

mawong-amd · 2024-06-07T10:06:31Z

This small PR does the following:

Fix buggy RCCL installation in Dockerfile: when installed as a package, it has to be installed twice.
Update linear.py, removing duplicated custom kernel invocation logic which has been moved inside tuned_gemm.py. It also generalizes the use of tuned_gemm.py (and hence tuning/custom kernel invocation) in place of a direct call to torch.nn.functional.Linear when bias is not fused.
Correct the CMake logic for the custom extension _custom_C so it is not built on CUDA.
Add pandas as a ROCm-specific requirement due to its use in FP8 linear methods, specifically when they relate to tuning.

…quirements, disable custom_C for CUDA

Fix RCCL pkg broken install, update linear.py custom logic, update re…

819ad9a

…quirements, disable custom_C for CUDA

mawong-amd merged commit 9d2f093 into main Jun 7, 2024
0 of 13 checks passed

mawong-amd deleted the 531_merge_small_fixes branch June 7, 2024 18:26

Provide feedback