Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

6.0.1 sidecar fails readiness probe with TLS enabled. #2239

Closed
chancez opened this issue Jul 24, 2024 · 10 comments · Fixed by #2257 or minio/pkg#126
Closed

6.0.1 sidecar fails readiness probe with TLS enabled. #2239

chancez opened this issue Jul 24, 2024 · 10 comments · Fixed by #2257 or minio/pkg#126
Assignees

Comments

@chancez
Copy link

chancez commented Jul 24, 2024

It's fairly similar to #2198 but in 6.0.1. Basically I'm seeing the sidecar fail it's readiness probes in a fresh install of 6.0.1 in our CI.

I'm hitting this issue where the sidecar starts but never becomes ready on a fresh install of 6.0.1 in our CI as well.

If I describe the pod I can see the readiness check is failing with HTTP 500:

  Warning  Unhealthy  4m14s (x14 over 4m27s)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 500

So I can confirm it's at least starting it's probeServer here: https://github.com/minio/operator/blob/master/sidecar/pkg/sidecar/sidecar_utils.go#L86-L116

Based on this, it must be erroring on one of these calls: https://github.com/minio/operator/blob/master/sidecar/pkg/sidecar/probes_handlers.go#L64-L76 but there's no logging so it's hard to tell since k8s doesn't log the response body for probe failures.

However, I think I figured it out: It's probably because we're using TLS:

https://github.com/minio/operator/blob/master/sidecar/pkg/sidecar/probes_handlers.go#L64 hardcodes the port to 9000 but the minio server listens on port 9443 when TLS is enabled. The sidecar needs to be configured to use the correct port when TLS is configured.

Expected Behavior

minio sidecar should be ready with TLS enabled.

Current Behavior

minio sidecar fails readiness probes with TLS enabled.

Possible Solution

Update the readiness probe logic of the sidecar to use the correct minio port when TLS is enabled.

Steps to Reproduce (for bugs)

Install minio-operator and tenant 6.0.1 with the following helm values:

secrets:
  enabled: true
  name: minio-secret
  accessKey: minio
  secretKey: quickstart
tenant:
  name: quickstart
  configuration:
    name: minio-secret
  pools:
    - name: pool-1
      servers: 1
      volumesPerServer: 4
      size: 2Gi
      storageClassName: standard
  buckets:
    - name: "clickhouse"
  certificate:
    externalCertSecret:
      - name: tls-minio
        type: kubernetes.io/tls
  kes:
    configuration: false
  log:
    disabled: true
  prometheus:
    disabled: true

Context

Cannot install minio via operator 6.0.1

Regression

yes

Worked: v5.0.16
Broken: v6.0.1

Your Environment

  • Version used (minio-operator): v6.0.1
  • Environment name and version (e.g. kubernetes v1.17.2): v1.30.0
  • Server type and version:
  • Operating System and version (uname -a): KIND 0.23.0
  • Link to your deployment file: values.yaml provided above
@chancez
Copy link
Author

chancez commented Jul 24, 2024

I'd also like to note this is at least the 2nd time the operator had a regression related to usage of TLS. I previously hit a regression with how the operator configure minio with TLS in #1839. Perhaps there should be a e2e test that installs minio configured with TLS via minio-operator with cert-manager to help catch these regressions.

@harshavardhana
Copy link
Member

@dvaldivia @pjuarezd ^^

@cesnietor
Copy link
Contributor

we'll check, we might have already identified the issue.

ramondeklein added a commit to ramondeklein/minio-operator that referenced this issue Aug 6, 2024
@ramondeklein
Copy link
Contributor

ramondeklein commented Aug 6, 2024

When TLS is enabled on the MinIO endpoint, then the sidecar runs a health check on https://localhost:9000. The client accepts any certificate (because of InsecureSkipVerify, because it trusts localhost).

When there is only a single certificate, then MinIO will always respond with that certificate (even if the SNI server name doesn't match).

When there are multiple certificates, then MinIO will try to match the SNI server name to any over the registered certificates and returns the first certificate that matches the SNI server name. If none of the certificates match, then it doesn't return any certificate and the TLS handshake is aborted.

That is exactly what is happening here. There is an auto generated certificate (signed by the cluster's CA) for minio.quickstart.svc.cluster.local and another certificate for minio.example.com, so when a request is received with SNI server name set to localhost it doesn't know what certificate to choose and it aborts.

@ramondeklein
Copy link
Contributor

ramondeklein commented Aug 6, 2024

I have issued minio/pkg#126 to return the default certificate in case none of the certificates match. That would solve this issue. I also submitted a #2257 for the operator, so it will provide the proper server-name in the request and prevent failure.

Either an update of MinIO or MinIO operator should fix the problem.

@thoro
Copy link

thoro commented Aug 20, 2024

I just installed operator 6.0.2 and RELEASE.2024-08-17T01-24-54Z - which both should have the fix, but I'm still getting the error "HTTP request failed: Get "https://localhost:9000": remote error: tls: internal error"

@ramondeklein
Copy link
Contributor

ramondeklein commented Aug 20, 2024

RELEASE.2024-08-17T01-24-54Z doesn't contain the fix, because github.com/minio/pkg/v3 is still using v3.0.11 and that module was patched afterwards. It looks like operator v6.0.2 still uses sidecar v6.0.0 (source), so it doesn't contain the fix.

@harshavardhana Can I tag a v3.0.12 release for pkg/v3 and submit a PR for MinIO to use it?

@ramondeklein ramondeklein reopened this Aug 20, 2024
@thoro
Copy link

thoro commented Aug 20, 2024

yep, running without auto-cert.

Adding localhost to the cert worked.

@ramondeklein
Copy link
Contributor

  • Operator v6.0.3 includes the operator that sends the proper server-name, so MinIO can respond.
  • MinIO RELEASE.2024-08-26T15-33-07Z falls back to the first TLS certificate if none of the certificate match.

@fritz-net
Copy link

I'd also like to note this is at least the 2nd time the operator had a regression related to usage of TLS. I previously hit a regression with how the operator configure minio with TLS in #1839. Perhaps there should be a e2e test that installs minio configured with TLS via minio-operator with cert-manager to help catch these regressions.

its not the only issue with TLS I had with minio, also UI did not work on non 9000 port/with tls
thanks for opening this issue helped me solve my reboot loop (I did a update from 5.0.11 to 6.0.4)

at least 2 happy path e2e tests with TLS would be perfect (one for API and one for UI)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants