-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clusters with cluster-external control planes cannot start the multicluster gateway, readiness probes are blocked #7560
Comments
Related to linkerd#7560, this modifies the proxy injector to use port 4192 and updates the multicluster manifest to match. See: linkerd/linkerd2-proxy#1428 Signed-off-by: Aaron Friel <[email protected]>
Related to linkerd#7560, this modifies the proxy injector to use port 4192 and updates the multicluster manifest to match. See: linkerd/linkerd2-proxy#1428 Signed-off-by: Aaron Friel <[email protected]>
Related to linkerd#7560, this modifies the proxy injector to use port 4192 and updates the multicluster manifest to match. See: linkerd/linkerd2-proxy#1428 Signed-off-by: Aaron Friel <[email protected]>
related: #7050 |
Related to linkerd#7560, this modifies the proxy injector to use port 4192 and updates the multicluster manifest to match. See: linkerd/linkerd2-proxy#1428 Signed-off-by: Aaron Friel <[email protected]>
Related to linkerd#7560, this modifies the proxy injector to use port 4192 and updates the multicluster manifest to match. See: linkerd/linkerd2-proxy#1428 Signed-off-by: Aaron Friel <[email protected]>
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions. |
It seems simpler to configure and consistent with more common firewall deployments to allow the readiness probe to be on a separate port. It looks like the work @olix0r refers to on #7050 would address this, but I do think from an operational perspective, separate ports is just much simpler to manage. And for the sake of compatibility, this PR preserves the existing /live and /ready routes. I don't see a downside in advertising /live and /ready on two ports. Advanced operators that feel comfortable using a single port for all authorizations can do so, and for most operators with L3/L4 firewalls can easily add defense in depth via port-based firewall rules. |
Multicluster probes are authorized by default, even when the default policy is deny. |
What problem are you trying to solve?
On some Kubernetes distributions, requests from the control plane may not come from a private address range IP address or even a consistent IP address. This poses a problem, because the admin server used in a multicluster mesh needs to simultaneously serve /live and /ready routes to:
In order to avoid exposing the other admin routes, the multicluster gateway uses an authorization policy forbidding unauthorized and out-of-cluster requests. This causes the gateway to fail readiness and liveness probes.
Example: On Linode Kubernetes Engine (LKE), probes originate from outside the cluster (e.g.: from 45.79.0.0/21), however the
ServerAuthorization policy on the linkerd-gateway is by default as only allowing localhost.
See these trace logs:
How should the problem be solved?
I would suggest adding[1] a separate server to the proxy on a distinct port. The implementation could occur in a series of steps:
/ready
and/live
on a new port while maintaining the existing routes on the admin port./ready
and/live
routes on the admin server, deprecate those routes./admin
and/live
from the admin server.[1] I have done so in these two pull requests:
Any alternatives you've considered?
In the linkerd community discord, @olix0r has suggested that route-based authorizations, being worked on for a future Linkerd release, would be able to allow this dual role.
My argument in favor of the separate health server are:
deny
default cluster authorization, results in a cumbersome and significant additional amount of work for the proxy injector to maintain1.
Best practices with apps on Kubernetes, and generally, is one of least privilege: a port that only exposes an HTTP server serving
/ready
is easier to secure than one that also exposes/fireTheMissiles
(hyperbole... but only a little.) Separate ports with separate concerns are easily handled using existing tooling, and safely exposed (if the user wishes) using the L3 routing Kubernetes provides by default to containers and via load balancers.2.
In a default cluster install, the absence of a server authorization is fail open (
all-unauthenticated
), which means that any mistake removing the server authorization from a gateway will expose privileged routes to the internet. Infrastructure as code could cause a ServerAuthorization to be briefly deleted (replaced), which would make those routes open to the internet. As long as the default authorization policy remainsall-unauthenticated
, the multicluster gateway exposing the admin port to the internet is a large and risky footgun. Consider the proposed solution versus a route based authorization: which is simpler to maintain?One may note with my second argument that perhaps the issue is the
all-unauthenticated
aspect. One could —and I certainly would! — argue that if a cluster operator is running untrusted workloads, running a multi-tenant cluster, and so on, that they should change the default authorization policy. No question there. The risk profile, however, is very different for most cluster operators, and ease of use (for now) dictates that the installation default to an open policy which is simpler for users to deploy and operate.3.
Suppose that an operator does deploy with a
deny
default policy, and very carefully manages ServerAuthorizations for all of their workloads. The proxy injector here would have to become not just an injector, but also an operator managing additional authorizations for each workload they inject. Why? Because, going back to the original issue, on clusters such as the one described there, readiness probes come as plain HTTP requests from unpredictable IP addresses.The proxy injector, in this scenario, would therefore have to add a ServerAuthorization for each workload it injects, authorizing
/ready
and/live
. Or either the defaultdeny
route would have to have an asterisk: it is a default deny, except for two routes on port 4191 or cluster operators would have to do so themselves.How would users interact with this feature?
This feature and/or resolution of this issue should be transparent to any user.
Would you like to work on this feature?
yes
The text was updated successfully, but these errors were encountered: