Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8s] Robust service account and namespace support #3632

Merged
merged 4 commits into from
Jun 6, 2024

Conversation

romilbhardwaj
Copy link
Collaborator

@romilbhardwaj romilbhardwaj commented Jun 4, 2024

This PR has a few updates for k8s SA and namespace support:

Code:

  1. Moves the skypilot-system namespace creation to happen only if the default SERVICE_ACCOUNT remote_identity is used. This is to reduce the scope of permissions required if a SA already exists.
  2. Update namespace fetching logic to also account for the case when the cluster is running in a in-cluster auth config but in a namespace different than 'default'. Previously we had been testing on default namespace so this bug was not caught.

Docs:

  1. Updated the minimum permissions for Kubernetes docs and confirms that the permissions listed in the docs work.
  2. Added permissions required if storage mounting is used
  3. Add comments if the user needs to specify their own namespace.

Ran the following tests for two scenarios on a GKE cluster:

  1. User does not have a special service account and uses the default namespace. This is a base case to verify existing functionality does not break.
  2. User creates a service account in a special myns namespace following the create-sky-sa.yaml example in the docs, creates a kubeconfig that uses this namespace and service account to authenticate then uses this kubeconfig to run SkyPilot in the myns namespace with remote_identity: sky-sa.
  • Manual sky launch test both from laptop and from pod launched by SkyPilot
  • Managed Job controller smoke test on Kubernetes using storage mounting: pytest -v tests/test_smoke.py::test_managed_jobs_storage --kubernetes

@romilbhardwaj romilbhardwaj requested a review from Michaelvll June 5, 2024 18:44
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this @romilbhardwaj! Left several comments : )

docs/source/cloud-setup/cloud-permissions/kubernetes.rst Outdated Show resolved Hide resolved
docs/source/cloud-setup/cloud-permissions/kubernetes.rst Outdated Show resolved Hide resolved
docs/source/cloud-setup/cloud-permissions/kubernetes.rst Outdated Show resolved Hide resolved
docs/source/cloud-setup/cloud-permissions/kubernetes.rst Outdated Show resolved Hide resolved
Comment on lines +320 to +322
After creating the service account, the cluster admin may distribute kubeconfigs with the ``sky-sa`` service account to users who need to access the cluster.

Users should also configure SkyPilot to use the ``sky-sa`` service account through ``~/.sky/config.yaml``:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change the sky-sa name above the same as the default name in our codebase, so after creation, the user does not need to specify it in a ~/.sky/config.yaml (we can still mention that a user can change the name of the service account and specify the config yaml, but we should keep the service account creation with the same name as the hardcoded one)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the default service account skypilot-service-account requires additional permissions because our code is self-correcting - it automatically fixes any service account misconfigurations by creating/patching resources (e.g., when user is upgrading versions or has accidentally deleted k8s RBAC resources created by SkyPilot).

  1. Our code will inspect and create/update the necessary roles and rolebindings. This requires additional "create", "patch" permissions on "clusterroles", "clusterrolebindings".
  2. Our code will inspect and create the skypilot-system namespace if it doesn't exist. This requires additional "list", "create" permissions on namespaces.

These extra permissions on "clusterroles", "clusterrolebindings", "namespaces" may be considered too permissive in some environments (e.g., shared clusters). Perhaps we should keep the permissions here limited to minimal permissions required?

@@ -18,7 +18,7 @@ SkyPilot's Kubernetes support is designed to work with most Kubernetes distribut
To connect to a Kubernetes cluster, SkyPilot needs:

* An existing Kubernetes cluster running Kubernetes v1.20 or later.
* A `Kubeconfig <kubeconfig>`_ file containing access credentials and namespace to be used.
* A `Kubeconfig <kubeconfig>`_ file containing access credentials and namespace to be used. Refer to :ref:`required permissions <cloud-permissions-kubernetes>` guide for details.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

Suggested change
* A `Kubeconfig <kubeconfig>`_ file containing access credentials and namespace to be used. Refer to :ref:`required permissions <cloud-permissions-kubernetes>` guide for details.
* A `Kubeconfig <kubeconfig>`_ file containing access credentials and namespace to be used. To reduce the permissions for a user, check :ref:`required permissions <cloud-permissions-kubernetes>` for details.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Updated. I'm also reworking this page in a separate PR.

sky/utils/kubernetes/generate_static_kubeconfig.sh Outdated Show resolved Hide resolved
@romilbhardwaj
Copy link
Collaborator Author

Thanks @Michaelvll. Addressed comments and updated generate_static_kubeconfig.sh to be a lot more friendlier to use.

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing those permissions @romilbhardwaj! This is very important for our k8s users. : )

Comment on lines +12 to +24
# * Specify SKYPILOT_NAMESPACE env var to override the default namespace
# * Specify SKYPILOT_SA_NAME env var to override the default service account name
# * Specify SKIP_SA_CREATION=1 to skip creating the service account and use an existing one
#
# Usage:
# # Create "sky-sa" service account with minimal permissions in "default" namespace and generate kubeconfig
# $ ./generate_static_kubeconfig.sh
#
# # Create "my-sa" account with minimal permissions in "my-namespace" namespace and generate kubeconfig
# $ SKYPILOT_SA_NAME=my-sa SKYPILOT_NAMESPACE=my-namespace ./generate_static_kubeconfig.sh
#
# # Use an existing service account "my-sa" in "my-namespace" namespace and generate kubeconfig
# $ SKIP_SA_CREATION=1 SKYPILOT_SA_NAME=my-sa SKYPILOT_NAMESPACE=my-namespace ./generate_static_kubeconfig.sh
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include the usage of this usage in the doc as well? So as to make it easier for the user to create the service account?

The doc for minimal permission above could serve as a deep dive into what this shell script do.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absolutely - I'm working on it in a separate for revamping k8s cluster admin docs :)

@romilbhardwaj romilbhardwaj merged commit 26d902d into master Jun 6, 2024
20 checks passed
@romilbhardwaj romilbhardwaj deleted the k8s_serviceaccount_docs branch June 6, 2024 23:16
Michaelvll pushed a commit that referenced this pull request Aug 23, 2024
* WIP

* Working permissions

* lint

* comments and update generate_static_kubeconfig.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants