Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COS apps unreachable after setting tls-* options #330

Open
przemeklal opened this issue Apr 23, 2024 · 0 comments
Open

COS apps unreachable after setting tls-* options #330

przemeklal opened this issue Apr 23, 2024 · 0 comments

Comments

@przemeklal
Copy link
Member

Bug Description

I tried to replace self-signed-certificates:certificates relations with the newly introduced tls-* config options. It didn't work.

  1. I tried setting tls-cert and tls-key only as the cert is signed by a well-trusted 3rd party CA. I ended up with Please set tls-cert, tls-key, and tls-ca
  2. I set the tls-ca anyway. I ended up with:
  • catalogue working fine with the new cert
  • "Bad Gateway" on /cos-prometheus-0
  • alertmanager working but with the old cert
  • "Internal Server Error" on /cos-grafana
  1. I removed :certificates relations which resulted in:
alertmanager/0*  error     idle   10.1.233.221         hook failed: "certificates-relation-broken" for ca:certificates
grafana/0*       error     idle   10.1.233.223         hook failed: "certificates-relation-broken" for ca:certificates
loki/0*          error     idle   10.1.233.222         hook failed: "certificates-relation-broken" for ca:certificates
prometheus/0*    error     idle   10.1.233.224         hook failed: "certificates-relation-broken" for ca:certificates
traefik/0*       error     idle   10.1.233.230         hook failed: "certificates-relation-broken" for ca:certificates
  1. After hammering juju resolve --no-retry and recreating COS apps pods a few times (prometheus, grafana, traefik, catalogue, alertmanager), I ended up with:
  • catalogue, alertmanager, prometheus reachable and serving the new cert
  • grafana endpoint throwing "Bad Gateway" and printing this in its logs:
2024-04-23T14:26:26.539Z [grafana] Error: ✗ *api.HTTPServer run error: cert_file cannot be empty when using HTTPS
  1. I tried recreating pods, unsetting tls-* options in traefik and setting them again, but I was not able to restore it.

To Reproduce

Listed above.

Environment

juju 3.4.0

Versions:

App           Version  Status  Scale  Charm                     Channel        Rev  Address         Exposed  Message
alertmanager  0.26.0   active      1  alertmanager-k8s          latest/edge    105  10.152.183.185  no       
ca                     active      1  self-signed-certificates  stable          72  10.152.183.220  no       
catalogue              active      1  catalogue-k8s             latest/stable   33  10.152.183.206  no       
grafana       9.5.3    active      1  grafana-k8s               latest/stable  105  10.152.183.219  no       
loki          2.9.4    active      1  loki-k8s                  latest/stable  118  10.152.183.233  no       
prometheus    2.49.1   active      1  prometheus-k8s            latest/stable  170  10.152.183.49   no       
traefik       v2.11.0  active      1  traefik-k8s               latest/edge    180  10.5.1.15       no       

Relevant log output

Traefik when trying to open /cos-grafana:

2024-04-23T14:25:21.427Z [traefik] time="2024-04-23T14:25:21Z" level=debug msg="'502 Bad Gateway' caused by: dial tcp 10.1.233.234:3000: connect: connection refused"

2024-04-23T14:26:26.531Z [grafana] logger=provisioning.dashboard type=file name=Default t=2024-04-23T14:26:26.531084435Z level=error msg="failed to save dashboard" file=/etc/grafana/provisioning/dashboards/juju_alertmanager-k8s_e9224b0.json error="SQL query for existing dashboard by UID failed: context canceled"
2024-04-23T14:26:26.535Z [grafana] logger=provisioning.dashboard type=file name=Default t=2024-04-23T14:26:26.535152504Z level=error msg="failed to save dashboard" file=/etc/grafana/provisioning/dashboards/juju_loki-k8s_0804127.json error="SQL query for existing dashboard by UID failed: context canceled"
2024-04-23T14:26:26.538Z [grafana] logger=provisioning.dashboard type=file name=Default t=2024-04-23T14:26:26.538180317Z level=error msg="failed to save dashboard" file=/etc/grafana/provisioning/dashboards/juju_prometheus-k8s_35dd368.json error="SQL query for existing dashboard by UID failed: context canceled"
2024-04-23T14:26:26.539Z [grafana] Error: ✗ *api.HTTPServer run error: cert_file cannot be empty when using HTTPS
2024-04-23T14:26:26.550Z [pebble] Service "grafana" stopped unexpectedly with code 1
2024-04-23T14:26:26.550Z [pebble] Service "grafana" on-failure action is "restart", waiting ~30s before restart (backoff 30)

Additional context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant