[WIP] Ray Backend V1 #10006

rkooo567 · 2024-11-04T23:01:06Z

Only working for TP 1 now.

github-actions · 2024-11-04T23:01:19Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

robertgshaw2-neuralmagic · 2024-11-05T01:13:19Z

@rkooo567 - QQ, what does Ray give us for V1? I thought mp for the workers ended up being more performant in V0

rkooo567 · 2024-11-05T01:31:48Z

@rkooo567 - QQ, what does Ray give us for V1? I thought mp for the workers ended up being more performant in V0

Good question! We are thinking to only support compiled graph version of ray backend. I think compiled graphs can address performance issue we used to have for tp, and I will verify that. For the shorter term, I think it can provide 3 benefits;

remote debugger
pipeline parallelism performance
native multi node support/performance

For the second part, it can implement PP with cleaner way & provide automatic overlap compute/comm/infiniband support, which can improve performance greatly.

njhill · 2024-11-05T15:34:47Z

Irrespective of the mechanism used, for V1 IMO we should rethink the executor abstraction/hierarchy rather than transferring the same structure from V0 and start with something minimal. IIRC we aren't planning to include PP in the first iteration of V1?

rkooo567 · 2024-11-05T16:09:42Z

@njhill can you tell me more details about the meaning of "minimal" here? are you saying you want to only support 1 backend?

rkooo567 · 2024-11-05T16:17:36Z

Here are some of my thoughts! I'd love to discuss more details @njhill!

although PP is not considered in the very initial design, we will need it soon anyway, so considering the different backend abstraction now seems not too bad idea.
IMO, the executor backend abstraction was okay in the original design and introduced not much complexity
Even for TP, it can provide additional features like a remote debugger out of the box that's helpful for debugging.

njhill · 2024-11-05T22:09:37Z

Thanks @rkooo567, it might just be my personal feeling, but I've felt for a while that the current executor/worker abstraction could be simplified, and needs some more thought especially w.r.t. how we are supporting different backends and accelerators.

rkooo567 · 2024-11-05T23:49:12Z

Thanks @rkooo567, it might just be my personal feeling, but I've felt for a while that the current executor/worker abstraction could be simplified, and needs some more thought especially w.r.t. how we are supporting different backends and accelerators.

I'd love to talk if you feel this way! Let me make tp > 1 work by today or tomorrow, and we can discuss what interface can be trimmed down. Do you have some time end of this week?

mergify · 2024-11-11T23:06:52Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @rkooo567.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

andoorve · 2024-11-14T18:24:18Z

@njhill @rkooo567 Have you had a chance to discuss this yet? Would be great to hear any conclusions.

njhill · 2024-11-14T19:40:05Z

@andoorve @rkooo567 apologies for not getting back to this, I won't have much chance for the next week since I'll be traveling on vacation :-/

But I added some of the thoughts in @tlrmchlsmth's existing V1 TP PR here, and have been looking more into how we can further hide/overlap the IPC overheads.

rkooo567 · 2024-11-25T22:53:30Z

Hi, I am catching up this now.

Ray Compiled Graph comes with shared memory automatically, so with serialization optimization, it should provide the best performance already (that's what we did already with v0 with msgspec).

@njhill @andoorve how's the tp work going on with mp? Do you think having a talk in person can help here?

andoorve · 2024-11-26T01:53:25Z

@rkooo567 We didn't have a chance to discuss yet, but would be interested in case @njhill has new ideas!

njhill · 2024-11-27T00:51:13Z

@rkooo567 yes we can discuss in person too... it was mainly that we're iterating on the architecture still for v1 including simplifying the executor abstraction and concurrency etc.

rkooo567 · 2024-11-27T22:00:23Z

The PR is moved to #10725

ip

c452025

mergify bot added the needs-rebase label Nov 11, 2024

Merge branch 'main' into ray-backend-v1

f656268

rkooo567 requested review from WoosukKwon, robertgshaw2-neuralmagic, njhill, ywang96, comaniac and alexm-neuralmagic as code owners November 26, 2024 05:23

mergify bot removed the needs-rebase label Nov 26, 2024

rkooo567 closed this Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Ray Backend V1 #10006

[WIP] Ray Backend V1 #10006

rkooo567 commented Nov 4, 2024 •

edited

Loading

github-actions bot commented Nov 4, 2024

robertgshaw2-neuralmagic commented Nov 5, 2024 •

edited

Loading

rkooo567 commented Nov 5, 2024 •

edited

Loading

njhill commented Nov 5, 2024

rkooo567 commented Nov 5, 2024

rkooo567 commented Nov 5, 2024 •

edited

Loading

njhill commented Nov 5, 2024

rkooo567 commented Nov 5, 2024

mergify bot commented Nov 11, 2024

andoorve commented Nov 14, 2024

njhill commented Nov 14, 2024

rkooo567 commented Nov 25, 2024 •

edited

Loading

andoorve commented Nov 26, 2024

njhill commented Nov 27, 2024

rkooo567 commented Nov 27, 2024

[WIP] Ray Backend V1 #10006

[WIP] Ray Backend V1 #10006

Conversation

rkooo567 commented Nov 4, 2024 • edited Loading

github-actions bot commented Nov 4, 2024

robertgshaw2-neuralmagic commented Nov 5, 2024 • edited Loading

rkooo567 commented Nov 5, 2024 • edited Loading

njhill commented Nov 5, 2024

rkooo567 commented Nov 5, 2024

rkooo567 commented Nov 5, 2024 • edited Loading

njhill commented Nov 5, 2024

rkooo567 commented Nov 5, 2024

mergify bot commented Nov 11, 2024

andoorve commented Nov 14, 2024

njhill commented Nov 14, 2024

rkooo567 commented Nov 25, 2024 • edited Loading

andoorve commented Nov 26, 2024

njhill commented Nov 27, 2024

rkooo567 commented Nov 27, 2024

rkooo567 commented Nov 4, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Nov 5, 2024 •

edited

Loading

rkooo567 commented Nov 5, 2024 •

edited

Loading

rkooo567 commented Nov 5, 2024 •

edited

Loading

rkooo567 commented Nov 25, 2024 •

edited

Loading