[Serve] Multi-node support #3357

MaoZiming · 2024-03-22T20:08:44Z

Tested (run the relevant ones):

Code formatting: bash format.sh
Any manual or new tests for this PR (please specify below)
All smoke tests: pytest tests/test_smoke.py
Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

cblmemo

Humm I'm a little confused about this PR. I suppose the "multi-node" refers to multiple VMs collectively hosting one single LLM/Model? In such configuration, there is only one endpoint for each cluster. This PR seems more like packing several replicas into one replica, while every replica has its own endpoint.

Could we try sth like pytorch distributed or ray to host a model on >1 nodes?

MaoZiming · 2024-03-23T07:11:38Z

Yeah. can we assume that when a node fails, the main readiness probe for the replica will fail? If yes, the current code can probably work.
I am thinking the replica manager can send health probe to all the nodes of the replica. There will be only one service endpoint for the whole node.

cblmemo · 2024-03-23T07:49:46Z

Yeah. can we assume that when a node fails, the main readiness probe for the replica will fail? If yes, the current code can probably work. I am thinking the replica manager can send health probe to all the nodes of the replica. There will be only one service endpoint for the whole node.

I think it is fair to assume if we are using actual workload as readiness probe. If we are using some health check API, that might depend on the implementation, but I assume most of the framework will say it is not ready when one of the nodes failed (which will cause some error in the e2e inference iiuc).

One of the biggest concerns I have about sending health probes to all nodes is that, will current framework launch a HTTP server on each node only for readiness probe purpose? This seems not necessary for a model inference engine, so I assume not much framework will implement this. Ideally, a readiness probe to the head node (or the node hosting the HTTP server) indicates the whole system's healthiness?

(we need investigate more

MaoZiming · 2024-03-23T16:15:12Z

@cblmemo I think let's assume that when one of the node fails, the readiness probe for the head node will fail, until we do encounter a case where this is not true.
In this case, the current master should work.

MaoZiming added 3 commits March 22, 2024 13:08

add multi-node to skyserve

1f81583

rename

efd11ba

newline

e5c9d80

MaoZiming requested review from cblmemo and Michaelvll March 23, 2024 05:20

cblmemo reviewed Mar 23, 2024

View reviewed changes

MaoZiming closed this Mar 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Serve] Multi-node support #3357

[Serve] Multi-node support #3357

MaoZiming commented Mar 22, 2024 •

edited

Loading

cblmemo left a comment

MaoZiming commented Mar 23, 2024 •

edited

Loading

cblmemo commented Mar 23, 2024 •

edited

Loading

MaoZiming commented Mar 23, 2024

[Serve] Multi-node support #3357

[Serve] Multi-node support #3357

Conversation

MaoZiming commented Mar 22, 2024 • edited Loading

cblmemo left a comment

Choose a reason for hiding this comment

MaoZiming commented Mar 23, 2024 • edited Loading

cblmemo commented Mar 23, 2024 • edited Loading

MaoZiming commented Mar 23, 2024

MaoZiming commented Mar 22, 2024 •

edited

Loading

MaoZiming commented Mar 23, 2024 •

edited

Loading

cblmemo commented Mar 23, 2024 •

edited

Loading