Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SECURITY ISSUE] A potential risk in spark-operator which can be levereaged to make cluster-level privilege escalation #2070

Open
sparkEchooo opened this issue Jun 23, 2024 · 1 comment · May be fixed by #2049

Comments

@sparkEchooo
Copy link

sparkEchooo commented Jun 23, 2024

Summary

In ACK, an account mounted by a component called spark-operator has the create mutatingwebhookconfigurations permission. This component runs as a Deployment in the cluster. If an attacker gains control of the worker node running the spark-operator in the ACK cluster, they can easily obtain the account token of the spark-operator component. Consequently, they can use the create mutatingwebhookconfigurations permission of the account to add a malicious webhook to the cluster, which achieves listening and tampering with cluster resources, and eventually take control of the entire cluster.

Detailed Analysis

In Kubernetes, the direct permissions that RBAC grants to different app components to perform specified operations on resources as explicit permissions, such as the create mutatingwebhookconfigurations permission held by the spark-operator component. We observe that due to the intricate inter-dependencies among resources and operations within the Kubernetes cluster, these permissions can indirectly affect the state of other resources. This influence, which we refer to as implicit permissions, allows operations on resources without authorization. For example, in the permission definition for spark-operator, it is explicitly granted the create mutatingwebhookconfigurations permission to give the component create access to mutatingwebhookconfiguration resources. However, the create mutatingwebhookconfigurations permission not only affects the mutatingwebhookconfiguration resources themselves but also impacts all other resources: by updating a mutatingwebhookconfiguration, It can add a custom webhook server to monitor all resource requests in the cluster and modify the requests without authorization.. Therefore, if an attacker gains control of the worker node running the spark-operator in the ACK cluster, he/she can use the spark-operator component token to create a mutatingwebhookconfiguration adding a malicious webhook in order to steal the high-privileged token(e.g.,monitoring the 'create pods' requests in the cluster and modify the original request to create a custom malicious pod for privilege escalation.). Ultimately, the attacker can use the token to escalate privileges and take control of the entire cluster.
By the way, Below are some similar issues that we have previously reported and were confirmed, which you may refer to.
https://nvd.nist.gov/vuln/detail/CVE-2023-22645
https://nvd.nist.gov/vuln/detail/CVE-2023-30512
open-cluster-management-io/ocm#325

Mitigation Discussion

Developer could delete the create mutatingwebhookconfigurations permission.

A few questions

Is it a real issue in spark-operator?
If it's a real issue, can spark-operator mitigate the risks following my suggestions discussed in the "Mitigation Discussion"?
If it's a real issue, does spark-operator plan to fix this issue?

@yuchaoran2011
Copy link
Contributor

Should be addressed in #2049

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants