Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8s] Prototype - SkyServe on K8s #3109

Closed
wants to merge 36 commits into from
Closed

Conversation

romilbhardwaj
Copy link
Collaborator

@romilbhardwaj romilbhardwaj commented Feb 7, 2024

Working prototype of SkyServe on k8s. Both the controller and the replicas are run on k8s cluster.

TODOs

  • Clean up code
  • Fix TODOs in code
  • Automatically invoke generate_static_kubeconfig.sh for exec based kubeconfigs - no longer required with SA support
  • Disallow file_mounts and workdir when no storage cloud is available
  • Reconsider uploading sky-key and instead use the solution from [k8s] Share SSH jump pod #2826 to allow multiple keys to auth to ssh jump pod
  • Check azure query_endpoints implementation.

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • HTTP Serve Example on GKE with ingress
  • HTTP Serve Example on GKE with LoadBalancer
  • HTTP Serve Example on EKS with ingress
  • HTTP Serve Example on EKS with LoadBalancer
  • HTTP Serve Example on OnPrem (RKE) with ingress
  • HTTP Serve Example on OnPrem (RKE) with LoadBalancer
  • All smoke tests: pytest tests/test_smoke.py

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the support sky serve on kubernetes @romilbhardwaj! Just did a quick pass and left several comments. : )

docs/source/reference/config.rst Show resolved Hide resolved
sky/backends/backend_utils.py Show resolved Hide resolved
sky/backends/backend_utils.py Show resolved Hide resolved
Comment on lines +1963 to +1976
if endpoint:
cluster_endpoint = backend_utils.get_endpoints(
cluster_record['name'], endpoint)
click.echo(cluster_endpoint)
else:
cluster_endpoints = backend_utils.get_endpoints(
cluster_record['name'])
assert isinstance(cluster_endpoints, dict)
for port, port_endpoint in cluster_endpoints.items():
click.echo(
f'{colorama.Fore.BLUE}{colorama.Style.BRIGHT}{port}'
f'{colorama.Style.RESET_ALL}: '
f'{colorama.Fore.CYAN}{colorama.Style.BRIGHT}'
f'{port_endpoint}{colorama.Style.RESET_ALL}')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
if endpoint:
cluster_endpoint = backend_utils.get_endpoints(
cluster_record['name'], endpoint)
click.echo(cluster_endpoint)
else:
cluster_endpoints = backend_utils.get_endpoints(
cluster_record['name'])
assert isinstance(cluster_endpoints, dict)
for port, port_endpoint in cluster_endpoints.items():
click.echo(
f'{colorama.Fore.BLUE}{colorama.Style.BRIGHT}{port}'
f'{colorama.Style.RESET_ALL}: '
f'{colorama.Fore.CYAN}{colorama.Style.BRIGHT}'
f'{port_endpoint}{colorama.Style.RESET_ALL}')
cluster_endpoint = backend_utils.get_endpoints(cluster_record['name'], endpoint)
if endpoint:
click.echo(cluster_endpoint)
return
for port, port_endpoint in cluster_endpoints.items():
click.echo(
f'{colorama.Fore.BLUE}{colorama.Style.BRIGHT}{port}'
f'{colorama.Style.RESET_ALL}: '
f'{colorama.Fore.CYAN}{colorama.Style.BRIGHT}'
f'{port_endpoint}{colorama.Style.RESET_ALL}')

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backend_utils.get_endpoints() is invoked differently in the two calls - one is with a specific endpoint, which returns a str, and the other is endpoint=None, which returns a dict.

sky/serve/replica_managers.py Show resolved Hide resolved
@@ -23,6 +23,7 @@
from sky import exceptions
from sky import global_user_state
from sky import status_lib
from sky.backends import backend_utils
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing backend_utils here may caues circular import. We should be careful about this.

sky/templates/sky-serve-controller.yaml.j2 Show resolved Hide resolved
sky/utils/controller_utils.py Show resolved Hide resolved
sky/utils/schemas.py Show resolved Hide resolved
@romilbhardwaj romilbhardwaj changed the title [k8s] SkyServe on K8s [k8s] Prototype - SkyServe on K8s Mar 28, 2024
@romilbhardwaj romilbhardwaj mentioned this pull request Mar 28, 2024
8 tasks
@romilbhardwaj
Copy link
Collaborator Author

Moved to #3377. This branch will be kept around till #3377 is merged.

@Michaelvll
Copy link
Collaborator

Since #3377 was merged, closing this PR for now. : )

@Michaelvll Michaelvll closed this May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[k8s][spot] Support spot controller on Kubernetes or local clusters
2 participants