Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] sky serve up doesn't fetch existing clusters #3122

Closed
mrPsycox opened this issue Feb 8, 2024 · 4 comments
Closed

[Serve] sky serve up doesn't fetch existing clusters #3122

mrPsycox opened this issue Feb 8, 2024 · 4 comments

Comments

@mrPsycox
Copy link

mrPsycox commented Feb 8, 2024

Running this command: sky serve up skypilot-dev.yaml

With this yaml file:

service:
  readiness_probe:
    path: /v1/models
    initial_delay_seconds: 1200

envs:
  SKYPILOT_NUM_GPUS_PER_NODE: 4

num_nodes: 1

resources:
  cloud: kubernetes
  accelerators: V100:4
  ports: 
    - 9999

  cpus: 4+
  memory: 8+

setup: |
  conda create -n vllm python=3.9 -y
  conda activate vllm
  pip install vllm

run: |
  conda activate vllm
  python -u -m vllm.entrypoints.openai.api_server \
    --tensor-parallel-size $SKYPILOT_NUM_GPUS_PER_NODE \
    --dtype half \
    --host 0.0.0.0 \
    --model mistralai/Mixtral-8x7B-Instruct-v0.1

sky have an unexpected behavior. After correctly selecting the cloud (as you see below, my Kubernetes cluster on prem) serves goes to launch a new instance on AWS (without asking or telling me).

Here below the command result:

Service from YAML spec: skypilot-dev.yaml Service Spec: Readiness probe method: GET /v1/models Readiness initial delay seconds: 1200 Replica autoscaling policy: Fixed 1 replica Each replica will use the following resources (estimated): I 02-08 13:59:49 optimizer.py:694] == Optimizer == I 02-08 13:59:49 optimizer.py:717] Estimated cost: $0.0 / hour I 02-08 13:59:49 optimizer.py:717] I 02-08 13:59:49 optimizer.py:840] Considered resources (1 node): I 02-08 13:59:49 optimizer.py:910] ---------------------------------------------------------------------------------------------------- I 02-08 13:59:49 optimizer.py:910] CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN I 02-08 13:59:49 optimizer.py:910] ---------------------------------------------------------------------------------------------------- I 02-08 13:59:49 optimizer.py:910] Kubernetes 4CPU--8GB--4V100 4 8 V100:4 kubernetes 0.00 ✔ I 02-08 13:59:49 optimizer.py:910] ---------------------------------------------------------------------------------------------------- I 02-08 13:59:49 optimizer.py:910] Launching a new service 'sky-service-12e9'. Proceed? [Y/n]: Y Launching controller for 'sky-service-12e9'... W 02-08 13:59:55 instance.py:641] Expected security group sky-sg-sky-serve-controller-fcd54c3d-fcd5 not found. W 02-08 13:59:55 instance.py:764] Find security group failed. Skip cleanup security group. I 02-08 13:59:55 cloud_vm_ray_backend.py:4370] The cluster 'sky-serve-controller-fcd54c3d' (status: INIT) was not found on the cloud: it may be autodowned, manually terminated, or its launch never succeeded. Provisioning a new cluster by using the same resources as its original launch. I 02-08 13:59:56 cloud_vm_ray_backend.py:4389] Creating a new cluster: 'sky-serve-controller-fcd54c3d' [1x AWS(m6i.xlarge, disk_size=200, ports=['30001-30100'])]. I 02-08 13:59:56 cloud_vm_ray_backend.py:4389] Tip: to reuse an existing cluster, specify --cluster (-c). Run sky status to see existing clusters. I 02-08 13:59:56 cloud_vm_ray_backend.py:1386] To view detailed progress: tail -n100 -f /Users/mrpsycox/sky_logs/sky-2024-02-08-13-59-53-215853/provision.log I 02-08 13:59:57 provisioner.py:79] Launching on AWS us-east-1 (us-east-1a,us-east-1b,us-east-1c,us-east-1d,us-east-1f)

@mrPsycox
Copy link
Author

mrPsycox commented Feb 8, 2024

Also, the command result suggests to me to specify an existing cluster, but I didn't find any valid option to do it. Anyone knows how to fix it?

@concretevitamin
Copy link
Member

Thanks for the report @mrPsycox.


@cblmemo: Can we create an issue to fix the UX problem of not displaying spot/serve controller in the confirmation prompts of sky spot launch / sky serve up? Cc @Michaelvll.

@mrPsycox
Copy link
Author

mrPsycox commented Feb 8, 2024

Thanks for the report @mrPsycox.

@cblmemo: Can we create an issue to fix the UX problem of not displaying spot/serve controller in the confirmation prompts of sky spot launch / sky serve up? Cc @Michaelvll.

Thanks for the help. I solved the issue using sky launch at the moment. I will look further the branch!

@mrPsycox mrPsycox closed this as completed Feb 8, 2024
@cblmemo
Copy link
Collaborator

cblmemo commented Feb 9, 2024

Thanks for the report @mrPsycox.

@cblmemo: Can we create an issue to fix the UX problem of not displaying spot/serve controller in the confirmation prompts of sky spot launch / sky serve up? Cc @Michaelvll.

Good point! Just filed an issue #3138 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants