Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release Plan of BitBLAS 0.0.1 #150

Open
4 of 5 tasks
LeiWang1999 opened this issue Aug 23, 2024 · 5 comments
Open
4 of 5 tasks

Release Plan of BitBLAS 0.0.1 #150

LeiWang1999 opened this issue Aug 23, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@LeiWang1999
Copy link
Contributor

LeiWang1999 commented Aug 23, 2024

Hi all, it's time for us to considering the official release of BitBLAS v0.0.1, here are some todo items before this release:

  • Finalize comprehensive test cases and benchmarking scripts.
  • Decide on the default policy for the release.
  • Performance benchmark results of this release
  • Ensure the vLLM PR is either merged or confirm it no longer requires further modifications related to BitBLAS.
  • Implement kernel name serialization based on hardware hints and configurations.
@LeiWang1999
Copy link
Contributor Author

LeiWang1999 commented Aug 23, 2024

Looking ahead, our future plan for v0.0.2 should include at least support for the Marlin template, quantized Flash Attention, and Group MOE :)

@LeiWang1999 LeiWang1999 added the enhancement New feature or request label Aug 24, 2024
@LeiWang1999
Copy link
Contributor Author

pr #153 serialized the kernel name with operator config and hint.

@xysmlx xysmlx pinned this issue Aug 26, 2024
@LeiWang1999
Copy link
Contributor Author

From a policy Perspective, I think we should currently use LOP.3 only for weight propagation, this approach is compatible not only with A100 devices but also with other common devices, such as SM 70 or AMD (even though it’s not currently implemented for AMD, but it could be).

For Stage3 Performance, we can provide option to enable.

Moreover, the incoming stream_k template should share the same weight transformation function with Stage3.

@LeiWang1999 LeiWang1999 unpinned this issue Sep 1, 2024
@LeiWang1999 LeiWang1999 pinned this issue Sep 1, 2024
@LeiWang1999
Copy link
Contributor Author

Think vllm pr vllm-project/vllm#6036 requires no further modifications to BitBLAS, we should consider publishing the formal release.

@LeiWang1999
Copy link
Contributor Author

PR #249 has successfully passed all test cases; we should now proceed to review the benchmark scripts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant