Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix RCCL install, linear.py logic, CMake custom extension, update requirement for FP8 compute #42

Merged
merged 1 commit into from
Jun 7, 2024

Conversation

mawong-amd
Copy link

This small PR does the following:

  1. Fix buggy RCCL installation in Dockerfile: when installed as a package, it has to be installed twice.
  2. Update linear.py, removing duplicated custom kernel invocation logic which has been moved inside tuned_gemm.py. It also generalizes the use of tuned_gemm.py (and hence tuning/custom kernel invocation) in place of a direct call to torch.nn.functional.Linear when bias is not fused.
  3. Correct the CMake logic for the custom extension _custom_C so it is not built on CUDA.
  4. Add pandas as a ROCm-specific requirement due to its use in FP8 linear methods, specifically when they relate to tuning.

@mawong-amd mawong-amd merged commit 9d2f093 into main Jun 7, 2024
0 of 13 checks passed
@mawong-amd mawong-amd deleted the 531_merge_small_fixes branch June 7, 2024 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant