Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] Multi-node support #3357

Closed
wants to merge 3 commits into from
Closed

[Serve] Multi-node support #3357

wants to merge 3 commits into from

Conversation

MaoZiming
Copy link
Collaborator

@MaoZiming MaoZiming commented Mar 22, 2024

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

@MaoZiming MaoZiming requested review from cblmemo and Michaelvll March 23, 2024 05:20
Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humm I'm a little confused about this PR. I suppose the "multi-node" refers to multiple VMs collectively hosting one single LLM/Model? In such configuration, there is only one endpoint for each cluster. This PR seems more like packing several replicas into one replica, while every replica has its own endpoint.

Could we try sth like pytorch distributed or ray to host a model on >1 nodes?

@MaoZiming
Copy link
Collaborator Author

MaoZiming commented Mar 23, 2024

Yeah. can we assume that when a node fails, the main readiness probe for the replica will fail? If yes, the current code can probably work.
I am thinking the replica manager can send health probe to all the nodes of the replica. There will be only one service endpoint for the whole node.

@cblmemo
Copy link
Collaborator

cblmemo commented Mar 23, 2024

Yeah. can we assume that when a node fails, the main readiness probe for the replica will fail? If yes, the current code can probably work. I am thinking the replica manager can send health probe to all the nodes of the replica. There will be only one service endpoint for the whole node.

I think it is fair to assume if we are using actual workload as readiness probe. If we are using some health check API, that might depend on the implementation, but I assume most of the framework will say it is not ready when one of the nodes failed (which will cause some error in the e2e inference iiuc).

One of the biggest concerns I have about sending health probes to all nodes is that, will current framework launch a HTTP server on each node only for readiness probe purpose? This seems not necessary for a model inference engine, so I assume not much framework will implement this. Ideally, a readiness probe to the head node (or the node hosting the HTTP server) indicates the whole system's healthiness?

(we need investigate more

@MaoZiming
Copy link
Collaborator Author

@cblmemo I think let's assume that when one of the node fails, the readiness probe for the head node will fail, until we do encounter a case where this is not true.
In this case, the current master should work.

@MaoZiming MaoZiming closed this Mar 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants