-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v0.5.2, v0.5.3, v0.6.0 Release Tracker #6434
Comments
July 23rd is Tuesday. Do you mean July 24th? |
v0.5.2 has been released: https://github.com/vllm-project/vllm/releases/tag/v0.5.2 |
Hello. Can #4409 be included in any of the next releases? Or at least, can I get an explanation as to why it can't be included (maybe I can help in some way?). Once the wheel size limit increase was approved, and #6394 was merged, wheel size should not be an issue. I am currently waiting for PyPI staff to approve the wheel size limit increase request to publish the patched It would be nice to see support for Pascal GPUs in vLLM. Many people use them because they are cheap. |
Hi @sasha0552, Thank you for bring this up. For now, would you mind maintain this in your fork? There are few reasons that we are hesitant to include support for Pascals:
|
Can this PR be added in v0.5.3? |
@AlphaINF unlikely given the current state of the PR at the moment (still being reviewed). but I'm looking very much forward to this PR as well! |
@simon-mo thanks! |
@simon-mo for this "async scheduling to overlap scheduling" do we have a plan? |
hello, when will v0.6.0 release? I'm looking forward to #5036 and MiniCPM-Llama3-V-2_5 |
#6463 is not available in v0.5.5 docker image. Can you please have it available for 0.6.0? |
Anything you want to discuss about vllm.
We will make a triplet of releases in the following 3 weeks.
Blockers
/metrics
endpoint #6463num_kv_heads=8
instead of 16.The reason for such pace is that we want to remove beam search (#6226), which unlocks a suite of scheduler refactoring to enhance performance (async scheduling to overlap scheduling and forward pass for example). We want to release v0.5.2 ASAP to issue warnings and uncover new signals. Then we will decide the removal in v0.6.0. Normally we will deprecate slowly by stretching it by one month or two. However, (1) RFC has been opened for a while (2) it is unfortunately on the critical path of refactoring and performance enhancements.Please also feel free to add release blockers. But do keep in mind that I will not slow the release for v0.5.* series unless critical bug.
The text was updated successfully, but these errors were encountered: