Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.5.2, v0.5.3, v0.6.0 Release Tracker #6434

Closed
5 of 7 tasks
simon-mo opened this issue Jul 15, 2024 · 11 comments · Fixed by #7139
Closed
5 of 7 tasks

v0.5.2, v0.5.3, v0.6.0 Release Tracker #6434

simon-mo opened this issue Jul 15, 2024 · 11 comments · Fixed by #7139
Labels
release Related to new version release

Comments

@simon-mo
Copy link
Collaborator

simon-mo commented Jul 15, 2024

Anything you want to discuss about vllm.

We will make a triplet of releases in the following 3 weeks.

  • v0.5.2 on Monday July 15th.
  • v0.5.3 by Tuesday July 23rd.
  • v0.6.0 after Monday July 29th.

Blockers

The reason for such pace is that we want to remove beam search (#6226), which unlocks a suite of scheduler refactoring to enhance performance (async scheduling to overlap scheduling and forward pass for example). We want to release v0.5.2 ASAP to issue warnings and uncover new signals. Then we will decide the removal in v0.6.0. Normally we will deprecate slowly by stretching it by one month or two. However, (1) RFC has been opened for a while (2) it is unfortunately on the critical path of refactoring and performance enhancements.

Please also feel free to add release blockers. But do keep in mind that I will not slow the release for v0.5.* series unless critical bug.

@simon-mo simon-mo added the misc label Jul 15, 2024
@simon-mo simon-mo added release Related to new version release and removed misc labels Jul 15, 2024
@WoosukKwon
Copy link
Collaborator

July 23rd is Tuesday. Do you mean July 24th?

@simon-mo
Copy link
Collaborator Author

v0.5.2 has been released: https://github.com/vllm-project/vllm/releases/tag/v0.5.2

@sasha0552
Copy link
Contributor

Hello. Can #4409 be included in any of the next releases? Or at least, can I get an explanation as to why it can't be included (maybe I can help in some way?).

Once the wheel size limit increase was approved, and #6394 was merged, wheel size should not be an issue.

I am currently waiting for PyPI staff to approve the wheel size limit increase request to publish the patched triton to PyPI (pypi/support#4295).

It would be nice to see support for Pascal GPUs in vLLM. Many people use them because they are cheap.

@simon-mo
Copy link
Collaborator Author

Hi @sasha0552,

Thank you for bring this up. For now, would you mind maintain this in your fork? There are few reasons that we are hesitant to include support for Pascals:

  • Aside from Triton, we are continuously relying on Cutlass, FlashAttention, and FlashInfer which all seems to dropped Pascal.
  • It is sufficiently easy to build from source in vLLM with Pascal support.
  • As we add more features and performance optimizations, we are afraid we can no longer test and maintain the support for support for Pascal due to added complexity.

@AlphaINF
Copy link

Can this PR be added in v0.5.3?
#5036

@simon-mo
Copy link
Collaborator Author

@AlphaINF unlikely given the current state of the PR at the moment (still being reviewed). but I'm looking very much forward to this PR as well!

@AlphaINF
Copy link

@simon-mo thanks!

@bohr
Copy link

bohr commented Jul 18, 2024

@simon-mo for this "async scheduling to overlap scheduling" do we have a plan?

@AlphaINF
Copy link

AlphaINF commented Aug 5, 2024

hello, when will v0.6.0 release? I'm looking forward to #5036 and MiniCPM-Llama3-V-2_5

@vrdn-23
Copy link
Contributor

vrdn-23 commented Aug 5, 2024

Would it be possible to get #6594 merged in before the next release is due? @joerunde @Yard1

@lionheartbeat12
Copy link

#6463 is not available in v0.5.5 docker image. Can you please have it available for 0.6.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release Related to new version release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants