-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release Plan of BitBLAS 0.0.1 #150
Comments
Looking ahead, our future plan for v0.0.2 should include at least support for the Marlin template, quantized Flash Attention, and Group MOE :) |
pr #153 serialized the kernel name with operator config and hint. |
From a policy Perspective, I think we should currently use LOP.3 only for weight propagation, this approach is compatible not only with A100 devices but also with other common devices, such as SM 70 or AMD (even though it’s not currently implemented for AMD, but it could be). For Stage3 Performance, we can provide option to enable. Moreover, the incoming stream_k template should share the same weight transformation function with Stage3. |
Think vllm pr vllm-project/vllm#6036 requires no further modifications to BitBLAS, we should consider publishing the formal release. |
PR #249 has successfully passed all test cases; we should now proceed to review the benchmark scripts. |
Hi all, it's time for us to considering the official release of BitBLAS v0.0.1, here are some todo items before this release:
The text was updated successfully, but these errors were encountered: