v0.5.2, v0.5.3, v0.6.0 Release Tracker #6434

simon-mo · 2024-07-15T03:19:27Z

Anything you want to discuss about vllm.

We will make a triplet of releases in the following 3 weeks.

v0.5.2 on Monday July 15th.
v0.5.3 by Tuesday July 23rd.
v0.6.0 after Monday July 29th.

Blockers

[Bugfix][Frontend] Fix missing /metrics endpoint #6463
[CI/Build] Build on Ubuntu 20.04 instead of 22.04 #6517
[BugFix] Fix use of per-request seed with pipeline parallel #6698
Test vLLM works with 405B that's num_kv_heads=8 instead of 16.

The reason for such pace is that we want to remove beam search (#6226), which unlocks a suite of scheduler refactoring to enhance performance (async scheduling to overlap scheduling and forward pass for example). We want to release v0.5.2 ASAP to issue warnings and uncover new signals. Then we will decide the removal in v0.6.0. Normally we will deprecate slowly by stretching it by one month or two. However, (1) RFC has been opened for a while (2) it is unfortunately on the critical path of refactoring and performance enhancements.

Please also feel free to add release blockers. But do keep in mind that I will not slow the release for v0.5.* series unless critical bug.

The text was updated successfully, but these errors were encountered:

WoosukKwon · 2024-07-15T05:34:25Z

July 23rd is Tuesday. Do you mean July 24th?

simon-mo · 2024-07-15T19:25:29Z

v0.5.2 has been released: https://github.com/vllm-project/vllm/releases/tag/v0.5.2

sasha0552 · 2024-07-16T11:06:22Z

Hello. Can #4409 be included in any of the next releases? Or at least, can I get an explanation as to why it can't be included (maybe I can help in some way?).

Once the wheel size limit increase was approved, and #6394 was merged, wheel size should not be an issue.

I am currently waiting for PyPI staff to approve the wheel size limit increase request to publish the patched triton to PyPI (pypi/support#4295).

It would be nice to see support for Pascal GPUs in vLLM. Many people use them because they are cheap.

simon-mo · 2024-07-16T19:04:03Z

Hi @sasha0552,

Thank you for bring this up. For now, would you mind maintain this in your fork? There are few reasons that we are hesitant to include support for Pascals:

Aside from Triton, we are continuously relying on Cutlass, FlashAttention, and FlashInfer which all seems to dropped Pascal.
It is sufficiently easy to build from source in vLLM with Pascal support.
As we add more features and performance optimizations, we are afraid we can no longer test and maintain the support for support for Pascal due to added complexity.

AlphaINF · 2024-07-17T07:53:25Z

Can this PR be added in v0.5.3?
#5036

simon-mo · 2024-07-17T18:14:22Z

@AlphaINF unlikely given the current state of the PR at the moment (still being reviewed). but I'm looking very much forward to this PR as well!

AlphaINF · 2024-07-18T02:10:57Z

@simon-mo thanks!

bohr · 2024-07-18T08:52:47Z

@simon-mo for this "async scheduling to overlap scheduling" do we have a plan？

AlphaINF · 2024-08-05T07:56:49Z

hello, when will v0.6.0 release? I'm looking forward to #5036 and MiniCPM-Llama3-V-2_5

vrdn-23 · 2024-08-05T16:50:53Z

Would it be possible to get #6594 merged in before the next release is due? @joerunde @Yard1

lionheartbeat12 · 2024-09-04T18:09:22Z

#6463 is not available in v0.5.5 docker image. Can you please have it available for 0.6.0?

simon-mo added the misc label Jul 15, 2024

simon-mo mentioned this issue Jul 15, 2024

bump version to v0.5.2 #6433

Merged

simon-mo added release Related to new version release and removed misc labels Jul 15, 2024

DarkLight1337 mentioned this issue Jul 22, 2024

[Bug]: No metrics exposed at /metrics with 0.5.2 (0.5.1 is fine), possible regression? #6461

Closed

casper-hansen mentioned this issue Jul 22, 2024

[DOC] Correct warning about performance #6654

Open

simon-mo mentioned this issue Aug 5, 2024

bump version to v0.5.4 #7139

Merged

simon-mo closed this as completed in #7139 Aug 5, 2024

sasha0552 mentioned this issue Aug 12, 2024

[Hardware][Nvidia] Enable support for Pascal GPUs #4409

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.5.2, v0.5.3, v0.6.0 Release Tracker #6434

v0.5.2, v0.5.3, v0.6.0 Release Tracker #6434

simon-mo commented Jul 15, 2024 •

edited by mgoin

Loading

WoosukKwon commented Jul 15, 2024

simon-mo commented Jul 15, 2024

sasha0552 commented Jul 16, 2024

simon-mo commented Jul 16, 2024

AlphaINF commented Jul 17, 2024

simon-mo commented Jul 17, 2024

AlphaINF commented Jul 18, 2024

bohr commented Jul 18, 2024

AlphaINF commented Aug 5, 2024 •

edited by linear bot

Loading

vrdn-23 commented Aug 5, 2024 •

edited by linear bot

Loading

lionheartbeat12 commented Sep 4, 2024

v0.5.2, v0.5.3, v0.6.0 Release Tracker #6434

v0.5.2, v0.5.3, v0.6.0 Release Tracker #6434

Comments

simon-mo commented Jul 15, 2024 • edited by mgoin Loading

Anything you want to discuss about vllm.

WoosukKwon commented Jul 15, 2024

simon-mo commented Jul 15, 2024

sasha0552 commented Jul 16, 2024

simon-mo commented Jul 16, 2024

AlphaINF commented Jul 17, 2024

simon-mo commented Jul 17, 2024

AlphaINF commented Jul 18, 2024

bohr commented Jul 18, 2024

AlphaINF commented Aug 5, 2024 • edited by linear bot Loading

vrdn-23 commented Aug 5, 2024 • edited by linear bot Loading

lionheartbeat12 commented Sep 4, 2024

simon-mo commented Jul 15, 2024 •

edited by mgoin

Loading

AlphaINF commented Aug 5, 2024 •

edited by linear bot

Loading

vrdn-23 commented Aug 5, 2024 •

edited by linear bot

Loading