Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multicluster gateway: explicitly allow out-of-cluster probes #7548

Closed

Conversation

AaronFriel
Copy link

@AaronFriel AaronFriel commented Jan 2, 2022

Problem: On Linode Kubernetes Engine (LKE), probes originate from outside the cluster (e.g.: from 45.79.0.0/21), however the
ServerAuthorization policy on the linkerd-gateway is by default as only allowing localhost.

See these trace logs:

# This line edited for readability:
[    29.766629s] TRACE ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=45.79.3.202:60606}: linkerd_app_inbound::policy::authorize::http: Authorizing request policy=AllowPolicy { dst: OrigDstAddr(0.0.0.0:4191), 
  server: Receiver { shared: Shared { value: RwLock(RwLock { data: ServerPolicy { protocol: Http1, 
    authorizations: [
      Authorization { networks: [Network { net: 0.0.0.0/0, except: [] }, Network { net: ::/0, except: [] }], authentication: TlsAuthenticated { identities: {}, suffixes: [Suffix { ends_with: "" }] }, name: "linkerd-gateway-probe" }, 
      Authorization { networks: [Network { net: 10.0.0.0/8, except: [] }, Network { net: 100.64.0.0/10, except: [] }, Network { net: 172.16.0.0/12, except: [] }, Network { net: 192.168.0.0/16, except: [] }], authentication: Unauthenticated, name: "proxy-admin" }, 
      Authorization { networks: [Network { net: 127.0.0.1/32, except: [] }, Network { net: ::1/128, except: [] }], authentication: Unauthenticated, name: "default:localhost" }
    ], name: "gateway-proxy-admin" } }), state: AtomicState(2), ref_count_rx: 8, notify_rx: Notify { state: 4, waiters: Mutex(Mutex { data: LinkedList { head: None, tail: None } }) }, notify_tx: Notify { state: 1, waiters: Mutex(Mutex { data: LinkedList { head: Some(0x7fd619cb8d78), tail: Some(0x7fd619cb8d78) } }) } }, version: Version(0) } }
[    29.766730s]  INFO ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=45.79.3.202:60606}: linkerd_app_inbound::policy::authorize::http: Request denied server=gateway-proxy-admin tls=None(NoClientHello) client=45.79.3.202:60606
[    29.766757s]  INFO ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=45.79.3.202:60606}:rescue{client.addr=45.79.3.202:60606}: linkerd_app_core::errors::respond: Request failed error=unauthorized connection on server gateway-proxy-admin
[    29.766776s] DEBUG ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=45.79.3.202:60606}: linkerd_app_core::errors::respond: Handling error on HTTP connection status=403 Forbidden version=HTTP/1.1 close=false
[    29.766794s] TRACE ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=45.79.3.202:60606}:encode_headers: hyper::proto::h1::role: Server::encode status=403, body=None, req_method=Some(GET)

Solution: Explicitly add a catch-all network.

Validation: This change was deployed on an LKE cluster on 2021-01-02 with the CNI plugin via Helm chart.

Signed-off-by: Aaron Friel [email protected]

@AaronFriel AaronFriel requested a review from a team as a code owner January 2, 2022 21:16
@olix0r
Copy link
Member

olix0r commented Jan 3, 2022

Unfortunately, this will also expose proxy metrics, etc externally to all clients, which probably shouldn't be part of Linkerd's default installation. It would probably be better to add specific authorizations outside of Linkerd's default install.

@AaronFriel
Copy link
Author

@olix0r Is the same port used for internal metrics and for external probes?

Is there a way to decouple those two?

AaronFriel added a commit to AaronFriel/linkerd2-proxy that referenced this pull request Jan 3, 2022
A separate server provides identically behaving /live and /ready routes
to the admin server. Does not remove the existing admin server's routes.

Background:

On some Kubernetes distributions, requests from the control plane may
not come from a private address range IP address or even a consistent IP
address. This poses a problem, because the admin server used in a
multicluster mesh needs to simultaneously serve /live and /ready routes
to:

* The Kubernetes control plane, for liveness and readiness probes
  respectively
* Remote clusters as part of probing for remote gateway

In order to avoid exposing the other admin routes, the multicluster
gateway uses an authorization policy forbidding unauthorized and
out-of-cluster requests. This causes the gateway to fail readiness and
liveness probes.

Resolution:

Implement a separate server in the proxy app that can securely serve
/live and /ready routes. The port that server listens on can be used for
health check probes internally, without an authorization policy.

See: linkerd/linkerd2#7548
AaronFriel added a commit to AaronFriel/linkerd2-proxy that referenced this pull request Jan 3, 2022
A separate server provides identically behaving /live and /ready routes
to the admin server. Does not remove the existing admin server's routes.

Background:

On some Kubernetes distributions, requests from the control plane may
not come from a private address range IP address or even a consistent IP
address. This poses a problem, because the admin server used in a
multicluster mesh needs to simultaneously serve /live and /ready routes
to:

* The Kubernetes control plane, for liveness and readiness probes
  respectively
* Remote clusters as part of probing for remote gateway

In order to avoid exposing the other admin routes, the multicluster
gateway uses an authorization policy forbidding unauthorized and
out-of-cluster requests. This causes the gateway to fail readiness and
liveness probes.

Resolution:

Implement a separate server in the proxy app that can securely serve
/live and /ready routes. The port that server listens on can be used for
health check probes internally, without an authorization policy.

See: linkerd/linkerd2#7548

Signed-off-by: Aaron Friel <[email protected]>
@AaronFriel AaronFriel force-pushed the linode-multicluster-gateway-fix branch from 28f9d5e to 3007989 Compare January 4, 2022 01:51
AaronFriel added a commit to AaronFriel/linkerd2-proxy that referenced this pull request Jan 4, 2022
A separate server provides identically behaving /live and /ready routes
to the admin server. Does not remove the existing admin server's routes.

Background:

On some Kubernetes distributions, requests from the control plane may
not come from a private address range IP address or even a consistent IP
address. This poses a problem, because the admin server used in a
multicluster mesh needs to simultaneously serve /live and /ready routes
to:

* The Kubernetes control plane, for liveness and readiness probes
  respectively
* Remote clusters as part of probing for remote gateway

In order to avoid exposing the other admin routes, the multicluster
gateway uses an authorization policy forbidding unauthorized and
out-of-cluster requests. This causes the gateway to fail readiness and
liveness probes.

Resolution:

Implement a separate server in the proxy app that can securely serve
/live and /ready routes. The port that server listens on can be used for
health check probes internally, without an authorization policy.

See: linkerd/linkerd2#7548

Signed-off-by: Aaron Friel <[email protected]>
@AaronFriel AaronFriel force-pushed the linode-multicluster-gateway-fix branch from 3007989 to 58944ef Compare January 4, 2022 02:52
AaronFriel added a commit to AaronFriel/linkerd2-proxy that referenced this pull request Jan 4, 2022
A separate server provides identically behaving /live and /ready routes
to the admin server. Does not remove the existing admin server's routes.

Background:

On some Kubernetes distributions, requests from the control plane may
not come from a private address range IP address or even a consistent IP
address. This poses a problem, because the admin server used in a
multicluster mesh needs to simultaneously serve /live and /ready routes
to:

* The Kubernetes control plane, for liveness and readiness probes
  respectively
* Remote clusters as part of probing for remote gateway

In order to avoid exposing the other admin routes, the multicluster
gateway uses an authorization policy forbidding unauthorized and
out-of-cluster requests. This causes the gateway to fail readiness and
liveness probes.

Resolution:

Implement a separate server in the proxy app that can securely serve
/live and /ready routes. The port that server listens on can be used for
health check probes internally, without an authorization policy.

See: linkerd/linkerd2#7548

Signed-off-by: Aaron Friel <[email protected]>
AaronFriel added a commit to AaronFriel/linkerd2-proxy that referenced this pull request Jan 4, 2022
A separate server provides identically behaving /live and /ready routes
to the admin server. Does not remove the existing admin server's routes.

Background:

On some Kubernetes distributions, requests from the control plane may
not come from a private address range IP address or even a consistent IP
address. This poses a problem, because the admin server used in a
multicluster mesh needs to simultaneously serve /live and /ready routes
to:

* The Kubernetes control plane, for liveness and readiness probes
  respectively
* Remote clusters as part of probing for remote gateway

In order to avoid exposing the other admin routes, the multicluster
gateway uses an authorization policy forbidding unauthorized and
out-of-cluster requests. This causes the gateway to fail readiness and
liveness probes.

Resolution:

Implement a separate server in the proxy app that can securely serve
/live and /ready routes. The port that server listens on can be used for
health check probes internally, without an authorization policy.

See: linkerd/linkerd2#7548

Signed-off-by: Aaron Friel <[email protected]>
Related to linkerd#7560, this
modifies the proxy injector to use port 4192 and updates the
multicluster manifest to match.

See: linkerd/linkerd2-proxy#1428

Signed-off-by: Aaron Friel <[email protected]>
@AaronFriel AaronFriel force-pushed the linode-multicluster-gateway-fix branch from 58944ef to dc97d85 Compare January 4, 2022 05:29
AaronFriel added a commit to AaronFriel/linkerd2-proxy that referenced this pull request Jan 4, 2022
A separate server provides identically behaving /live and /ready routes
to the admin server. Does not remove the existing admin server's routes.

Background:

On some Kubernetes distributions, requests from the control plane may
not come from a private address range IP address or even a consistent IP
address. This poses a problem, because the admin server used in a
multicluster mesh needs to simultaneously serve /live and /ready routes
to:

* The Kubernetes control plane, for liveness and readiness probes
  respectively
* Remote clusters as part of probing for remote gateway

In order to avoid exposing the other admin routes, the multicluster
gateway uses an authorization policy forbidding unauthorized and
out-of-cluster requests. This causes the gateway to fail readiness and
liveness probes.

Resolution:

Implement a separate server in the proxy app that can securely serve
/live and /ready routes. The port that server listens on can be used for
health check probes internally, without an authorization policy.

See: linkerd/linkerd2#7548

Signed-off-by: Aaron Friel <[email protected]>
@AaronFriel
Copy link
Author

@olix0r I've updated this branch with the same changes I packaged and successfully deployed on Linode, Digital Ocean, Google Cloud, and Azure.

There are likely other places where 4191 should change, but since the admin server still serves /ready and /live there should be no breaking changes. Of course, this PR is conditional on the RFC/issue #7560.

@adleong
Copy link
Member

adleong commented Jan 11, 2022

Hi @AaronFriel, I'm going to mark this PR as a draft until #7560 is addressed since we can't merge this PR until then.

@adleong adleong marked this pull request as draft January 11, 2022 19:32
@stale
Copy link

stale bot commented Apr 22, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Apr 22, 2022
@olix0r olix0r removed the wontfix label Apr 22, 2022
@adleong
Copy link
Member

adleong commented May 10, 2022

I'm going to close this PR. This work can continue once #7560 is addressed.

@adleong adleong closed this May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants