-
Notifications
You must be signed in to change notification settings - Fork 531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing SkyServe #2458
Introducing SkyServe #2458
Conversation
add http server example
* Add service schema * use new serve YAML * change to qpm * change to fix node * refactor init of SkyServiceSpec * change http example to new yaml format * update default value of from_yaml_config and handle service in task * Launching successfully * use argument in controller & redirector * resolve comments * use qps instead * raise when multiple task found * change to qps * introduce constants * introduce constants & fix bugs * add sky down * add Services No existing services. without STATUS (but with #healthy replica * format * add llama2 example * add fields to service db * status with replica information * fix policy parsing bug * add auth todo * add replica status todo * change cluster name prefix and order of the column * minor fixes * reorder status * change name: controller --> control plane * change name: middleware --> controller * clean code * rename default service name * env vars * add purge and skip identity check on serve controller * upload filemounts and workdir to storage & enhance --purge
… `sky serve logs` prototype (#2311) * introducing multiprocessing prototype * add run env to controller & redirector * reefactor and format * add control-plane and redirector logs * minor * minor * Refactor: move to infra provider * Refactor: move load balancer to redirector * refactor, add more logging * add replica status * resolve some TODOs * add post data feature * rename, format * add error message handling * bug fix & logging * fix a bug in continuous unhealthy * add error when user port is same with control plane * fix None post_data bug * add stable diffusion example * remove response body when code == 200 * add some TODOs and change RUNNING to READY * add failed status * add TODO for return failed replica info * fix sky serve status --help error * add console help messages * remove redundant stable diffusion setup files * rename healthy_replica --> ready_replica * adopt advice from code review * rename to service_name * adopt advice from comment
… logs` for replica info (#2353) * introducing multiprocessing prototype * add run env to controller & redirector * reefactor and format * add control-plane and redirector logs * minor * minor * Refactor: move to infra provider * Refactor: move load balancer to redirector * refactor, add more logging * add replica status * resolve some TODOs * add post data feature * rename, format * add error message handling * bug fix & logging * fix a bug in continuous unhealthy * add error when user port is same with control plane * fix None post_data bug * add stable diffusion example * remove response body when code == 200 * add some TODOs and change RUNNING to READY * add failed status * add TODO for return failed replica info * fix sky serve status --help error * add console help messages * remove redundant stable diffusion setup files * rename healthy_replica --> ready_replica * finish replica info & num * finish * adopt advice from code review * rename to service_name * finish state machine; TODO property based implementation * adopt advice from comment * adopt comments in #2311 * finish new replica status * modify http example more resonable * UX details & set default controller resources to VCPU=4 * add spinner for launching contorl plane & redirector process * add sky serve logs CLI for replicas * add uptime section for service table * relaunch replicas which terminated by exceeding consecutive failure threshold * UX details * code style * move serve dependency to controller yaml setup section * add launch log for replica * add resources preview * stop jupyter service to avoid port conflict * Apply suggestions from code review Co-authored-by: Wei-Lin Chiang <[email protected]> * fix userjob failed and launch failed not terminate replica; replica status FAILED --> CLEANUP_FAILED since we terminate all FAILED replica immediately now; remove --purge in termination * ux nits * 0.0.0.0 -> localhost * new log logic: use cluster status == UP instead of waiting 10s; early quit for replica not exist; skip all detailed file sync log * ux nits * change readiness timeout to initial delay seconds * disable some logging when SKYPILOT_DEBUG is not set * restore debug yaml * remove debug message * sync down log before teardown * rename failed status name (replica) * change controller resources vcpu to 4+ to avoid no 4 vcpu cloud * disable -c, -r, -i in sky serve logs CLI * add REPLICA column in service status * add CONTROLLER_FAILED status; wait until control plane & redirector job to be running. * add color for CONTROLLER_FAILED and a prompt to cleanup first if re-up a failed service * change uptime to first time ready * format * add comment for replica/service status in sky serve status -h * simplify yaml design * remove controller resources cloud=gcp * remove controller resources cloud=gcpsome comment * redirect setup logs to devnull * redirector listen on 0.0.0.0 & add app_port to controller resources * ux * fix readiness suffix * fix * fix * remove cloud=gcp * ux: remove reduncant str * disable launch & down & stop with reserved prefix controller- * support sky serve down service-* * ux * cleanup cloud storage when terminate * enable customized controller resources * abort if ports specified in resources * reorder service status column * new sky serve status: show replica all the time; refresh in parallel; check network first * remove name since we have service name column * at least one replica is ready -> service ready * Update sky/cli.py Co-authored-by: Wei-Lin Chiang <[email protected]> * Update sky/backends/backend_utils.py Co-authored-by: Wei-Lin Chiang <[email protected]> * Update sky/status_lib.py Co-authored-by: Wei-Lin Chiang <[email protected]> * Update sky/backends/backend_utils.py Co-authored-by: Wei-Lin Chiang <[email protected]> * Update sky/backends/backend_utils.py Co-authored-by: Wei-Lin Chiang <[email protected]> * Update sky/serve/redirector.py Co-authored-by: Wei-Lin Chiang <[email protected]> * Update sky/serve/redirector.py Co-authored-by: Wei-Lin Chiang <[email protected]> * add vllm example * upd http example * change uptime to None and merge get_uptime and get_replica_info * restore debug comment out code * add comment for DEFAULT_INITIAL_DELAY_SECONDS * min_replica -> min_replcias * format * Apply suggestions from code review Co-authored-by: Wei-Lin Chiang <[email protected]> * upd tgi example * upd examples * format, remove unnecessary refresh in sky serve logs, raise valueerror instead of click.secho red * add minimal http example * Apply suggestions from code review Co-authored-by: Wei-Lin Chiang <[email protected]> * fix typo * Apply suggestions from code review --------- Co-authored-by: Wei-Lin Chiang <[email protected]>
* add vicuna v1.5 example * add replica ip in table; rename some vars * warning if sky launch a service yaml * format * start progress after error log * fix type name * log format * logger with skylogging format * dump user app fail to control plane log * ux * add launched_at and service_yaml to local DB; delete cloud storage locally * rapid bootstraping * format * move skyserve controller to separate section in sky status * add hint to see detailed sky serve status * restore example * rename control plane to controller * rename to hello_skyserve * rename to hello_skyserve * change port to align doc * inline controller failed checking * override user resources parameter * format * add some todos * remove redundant return * use handle to store information * fix error const name * simplify resources representation * check cluster status earlier * minor * minor * add back service section since we still need it in controller * restore vicuna example * print all info when use sky serve status -a * better handling of unknown status * add warning for status that cannot be sky.down * minor comment fixes * remove Tip: to reuse an existing cluster * enable extra port on controller * more detailed info when acc is None * Apply suggestions from code review Co-authored-by: Wei-Lin Chiang <[email protected]> * add doc string --------- Co-authored-by: Wei-Lin Chiang <[email protected]>
* add msg * shorten * fix * add msg
* add gcp tests * add azure and aws test * fix cloud dependencies * use larger disk size to enable azure controller * mixed cloud test & install gcloud cli * format * fix * add prehook * minor & add smoke test function
* add cancel and gorilla example * update yaml & add readme * add CLI request cancel * Update sky/serve/examples/gorilla/gorilla.yaml Co-authored-by: Wei-Lin Chiang <[email protected]> * Update sky/serve/examples/misc/cancel/service.yaml Co-authored-by: Wei-Lin Chiang <[email protected]> * advice from code review * upd fschat installation --------- Co-authored-by: Wei-Lin Chiang <[email protected]>
* fix * format * update skyserve prompt * resolve comments * fix vllm
This is because we made an API change. |
getting the same issue stil |
Oh, probably try roll back to previous commit, run sky serve down and then switch to latest commit? |
hi, i went to a previous version and did sky server down -a -y and it worked. but when i again switch to latest commits i get the same errors again? |
Sorry for not making it clear 🤧 You can use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @cblmemo! LGTM. Just tried the branch in a new docker container and it works nicely.
nit: a UX question: is it possible for our load balancer to notice that the user is using a |
i got stuck due the -L issue a lot. i was using https://github.com/huggingface/text-generation-inference to deploy by models for inference and using their official client https://github.com/huggingface/text-generation-inference/tree/main/clients/python there client doesn't support redirects, took few hours for me to figure this out :) this should be mentioned in documentation as a note |
Thanks! We'll definitely mentioned it in the doc 🫡 |
@manishiitg Do you have a preference for the default behavior of
|
No preference |
This PR introduces SkyServe, a lightweight multi-cloud serving system.
User doc: https://docs.google.com/document/d/1vVmzLF-EkG3Moj-q47DQBGvFipK4PNfkz0V6LyaPstE/edit#heading=h.m6tcx7d50j23
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh