-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Ray Backend V1 #10006
[WIP] Ray Backend V1 #10006
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
@rkooo567 - QQ, what does Ray give us for V1? I thought |
Good question! We are thinking to only support compiled graph version of ray backend. I think compiled graphs can address performance issue we used to have for tp, and I will verify that. For the shorter term, I think it can provide 3 benefits;
For the second part, it can implement PP with cleaner way & provide automatic overlap compute/comm/infiniband support, which can improve performance greatly. |
Irrespective of the mechanism used, for V1 IMO we should rethink the executor abstraction/hierarchy rather than transferring the same structure from V0 and start with something minimal. IIRC we aren't planning to include PP in the first iteration of V1? |
@njhill can you tell me more details about the meaning of "minimal" here? are you saying you want to only support 1 backend? |
Here are some of my thoughts! I'd love to discuss more details @njhill!
|
Thanks @rkooo567, it might just be my personal feeling, but I've felt for a while that the current executor/worker abstraction could be simplified, and needs some more thought especially w.r.t. how we are supporting different backends and accelerators. |
I'd love to talk if you feel this way! Let me make tp > 1 work by today or tomorrow, and we can discuss what interface can be trimmed down. Do you have some time end of this week? |
This pull request has merge conflicts that must be resolved before it can be |
@andoorve @rkooo567 apologies for not getting back to this, I won't have much chance for the next week since I'll be traveling on vacation :-/ But I added some of the thoughts in @tlrmchlsmth's existing V1 TP PR here, and have been looking more into how we can further hide/overlap the IPC overheads. |
Hi, I am catching up this now. Ray Compiled Graph comes with shared memory automatically, so with serialization optimization, it should provide the best performance already (that's what we did already with v0 with msgspec). @njhill @andoorve how's the tp work going on with mp? Do you think having a talk in person can help here? |
@rkooo567 yes we can discuss in person too... it was mainly that we're iterating on the architecture still for v1 including simplifying the executor abstraction and concurrency etc. |
The PR is moved to #10725 |
Only working for TP 1 now.