Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Ray Backend V1 #10006

Closed
wants to merge 2 commits into from
Closed

[WIP] Ray Backend V1 #10006

wants to merge 2 commits into from

Conversation

rkooo567
Copy link
Collaborator

@rkooo567 rkooo567 commented Nov 4, 2024

Only working for TP 1 now.

Copy link

github-actions bot commented Nov 4, 2024

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@robertgshaw2-neuralmagic
Copy link
Collaborator

robertgshaw2-neuralmagic commented Nov 5, 2024

@rkooo567 - QQ, what does Ray give us for V1? I thought mp for the workers ended up being more performant in V0

@rkooo567
Copy link
Collaborator Author

rkooo567 commented Nov 5, 2024

@rkooo567 - QQ, what does Ray give us for V1? I thought mp for the workers ended up being more performant in V0

Good question! We are thinking to only support compiled graph version of ray backend. I think compiled graphs can address performance issue we used to have for tp, and I will verify that. For the shorter term, I think it can provide 3 benefits;

  • remote debugger
  • pipeline parallelism performance
  • native multi node support/performance

For the second part, it can implement PP with cleaner way & provide automatic overlap compute/comm/infiniband support, which can improve performance greatly.

@njhill
Copy link
Member

njhill commented Nov 5, 2024

Irrespective of the mechanism used, for V1 IMO we should rethink the executor abstraction/hierarchy rather than transferring the same structure from V0 and start with something minimal. IIRC we aren't planning to include PP in the first iteration of V1?

@rkooo567
Copy link
Collaborator Author

rkooo567 commented Nov 5, 2024

@njhill can you tell me more details about the meaning of "minimal" here? are you saying you want to only support 1 backend?

@rkooo567
Copy link
Collaborator Author

rkooo567 commented Nov 5, 2024

Here are some of my thoughts! I'd love to discuss more details @njhill!

  • although PP is not considered in the very initial design, we will need it soon anyway, so considering the different backend abstraction now seems not too bad idea.
  • IMO, the executor backend abstraction was okay in the original design and introduced not much complexity
  • Even for TP, it can provide additional features like a remote debugger out of the box that's helpful for debugging.

@njhill
Copy link
Member

njhill commented Nov 5, 2024

Thanks @rkooo567, it might just be my personal feeling, but I've felt for a while that the current executor/worker abstraction could be simplified, and needs some more thought especially w.r.t. how we are supporting different backends and accelerators.

@rkooo567
Copy link
Collaborator Author

rkooo567 commented Nov 5, 2024

Thanks @rkooo567, it might just be my personal feeling, but I've felt for a while that the current executor/worker abstraction could be simplified, and needs some more thought especially w.r.t. how we are supporting different backends and accelerators.

I'd love to talk if you feel this way! Let me make tp > 1 work by today or tomorrow, and we can discuss what interface can be trimmed down. Do you have some time end of this week?

Copy link

mergify bot commented Nov 11, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @rkooo567.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 11, 2024
@andoorve
Copy link
Collaborator

@njhill @rkooo567 Have you had a chance to discuss this yet? Would be great to hear any conclusions.

@njhill
Copy link
Member

njhill commented Nov 14, 2024

@andoorve @rkooo567 apologies for not getting back to this, I won't have much chance for the next week since I'll be traveling on vacation :-/

But I added some of the thoughts in @tlrmchlsmth's existing V1 TP PR here, and have been looking more into how we can further hide/overlap the IPC overheads.

@rkooo567
Copy link
Collaborator Author

rkooo567 commented Nov 25, 2024

Hi, I am catching up this now.

Ray Compiled Graph comes with shared memory automatically, so with serialization optimization, it should provide the best performance already (that's what we did already with v0 with msgspec).

@njhill @andoorve how's the tp work going on with mp? Do you think having a talk in person can help here?

@andoorve
Copy link
Collaborator

@rkooo567 We didn't have a chance to discuss yet, but would be interested in case @njhill has new ideas!

@njhill
Copy link
Member

njhill commented Nov 27, 2024

@rkooo567 yes we can discuss in person too... it was mainly that we're iterating on the architecture still for v1 including simplifying the executor abstraction and concurrency etc.

@rkooo567
Copy link
Collaborator Author

The PR is moved to #10725

@rkooo567 rkooo567 closed this Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants