Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traefik blocks when get svc only returns hostname, not ip address #378

Closed
asbalderson opened this issue Jun 27, 2024 · 2 comments · Fixed by #385
Closed

Traefik blocks when get svc only returns hostname, not ip address #378

asbalderson opened this issue Jun 27, 2024 · 2 comments · Fixed by #385

Comments

@asbalderson
Copy link

asbalderson commented Jun 27, 2024

Bug Description

in commit 354 the way the hostname/address for a load-balancer was refactored. In the initial code, we could handle cases with both hostname, or IP address being returned. While it is true that juju may manage the hostname for the service, in some cases, like AWS, kubectl get svc will only return a hostname for the service instead of IP address, resulting in traefik being blocked stating that "gateway address unavailable"

I think returning the IP adresss if it exists first, and then if it doesn't returning the hostname makes sense for the cases where the Loadbalancer assigns different context.

In the attached log output you can see the difference between the svc output for AWS vs metallb. SQA could also return what Octavia returns from this output as well when running traefik on o7k with a load balancer.

To Reproduce

  1. deploy charmed k8s on aws following these instructions: https://ubuntu.com/kubernetes/docs/aws-integration
  2. juju add-k8s the charmed k8s into the aws controller
  3. juju deploy cos-lite
  4. wait for deployment to settle, and traefik to be blocked with "gateway address unavailable"
  5. kubectl --kubeconfig kube.conf get svc -n cos traefik --output json to observe the lack of IP address in the load balancer status.

Environment

In the testing environment(s) SQA was running charmed k8s 1.28/stable on aws with the aws-integrator charm, and on baremetal was running microk8s 1.28/stable with metallb. we have seen this issue in all stable versions of juju 3.x and candidate versions of juju 3.5. In all cases we were using latest/stable for traefik.

Relevant log output

On microk8s with metallb we can see the svc context like so:

$ kubectl --kubeconfig kube.conf get svc -n cos traefik --output json
{
    "apiVersion": "v1",
    "kind": "Service",
    "metadata": {
        "annotations": {
            "controller.juju.is/id": "f1160f08-aaf7-41d0-8b49-0fa01b7f699b",
            "juju.is/version": "3.3.5",
            "metallb.universe.tf/ip-allocated-from-pool": "juju-system-microk8s-metallb",
            "model.juju.is/id": "8a98550d-b705-44b6-8ca4-06d7cc1181c9"
        },
        "creationTimestamp": "2024-06-27T17:18:03Z",
        "labels": {
            "app.juju.is/created-by": "traefik",
            "app.kubernetes.io/managed-by": "juju",
            "app.kubernetes.io/name": "traefik"
        },
        "name": "traefik",
        "namespace": "cos",
        "resourceVersion": "4290",
        "uid": "999b4ecf-37a2-4ec4-a76d-837cc421d1e1"
    },
    "spec": {
        "allocateLoadBalancerNodePorts": true,
        "clusterIP": "10.152.183.180",
        "clusterIPs": [
            "10.152.183.180"
        ],
        "externalTrafficPolicy": "Cluster",
        "internalTrafficPolicy": "Cluster",
        "ipFamilies": [
            "IPv4"
        ],
        "ipFamilyPolicy": "SingleStack",
        "ports": [
            {
                "name": "traefik",
                "nodePort": 31859,
                "port": 80,
                "protocol": "TCP",
                "targetPort": 80
            },
            {
                "name": "traefik-tls",
                "nodePort": 31320,
                "port": 443,
                "protocol": "TCP",
                "targetPort": 443
            }
        ],
        "selector": {
            "app.kubernetes.io/name": "traefik"
        },
        "sessionAffinity": "None",
        "type": "LoadBalancer"
    },
    "status": {
        "loadBalancer": {
            "ingress": [
                {
                    "ip": "10.246.167.196"
                }
            ]
        }
    }
}

and on AWS using ELB the output is:

$ kubectl --kubeconfig kube.conf get svc -n cos traefik --output json
{
    "apiVersion": "v1",
    "kind": "Service",
    "metadata": {
        "annotations": {
            "controller.juju.is/id": "c2a63cbe-4c3b-484d-8b61-c289f2260d5e",
            "juju.is/version": "3.3.5",
            "model.juju.is/id": "228b8510-e021-45b5-8b09-e4916256b514"
        },
        "creationTimestamp": "2024-06-27T17:45:42Z",
        "finalizers": [
            "service.kubernetes.io/load-balancer-cleanup"
        ],
        "labels": {
            "app.juju.is/created-by": "traefik",
            "app.kubernetes.io/managed-by": "juju",
            "app.kubernetes.io/name": "traefik"
        },
        "name": "traefik",
        "namespace": "cos",
        "resourceVersion": "7773",
        "uid": "b23f47a6-3538-4318-9715-5b2f26ca8a4a"
    },
    "spec": {
        "allocateLoadBalancerNodePorts": true,
        "clusterIP": "10.152.183.194",
        "clusterIPs": [
            "10.152.183.194"
        ],
        "externalTrafficPolicy": "Cluster",
        "internalTrafficPolicy": "Cluster",
        "ipFamilies": [
            "IPv4"
        ],
        "ipFamilyPolicy": "SingleStack",
        "ports": [
            {
                "name": "traefik",
                "nodePort": 32626,
                "port": 80,
                "protocol": "TCP",
                "targetPort": 80
            },
            {
                "name": "traefik-tls",
                "nodePort": 31108,
                "port": 443,
                "protocol": "TCP",
                "targetPort": 443
            }
        ],
        "selector": {
            "app.kubernetes.io/name": "traefik"
        },
        "sessionAffinity": "None",
        "type": "LoadBalancer"
    },
    "status": {
        "loadBalancer": {
            "ingress": [
                {
                    "hostname": "ab23f47a63538431897155b2f26ca8a4-1618134996.us-east-1.elb.amazonaws.com"
                }
            ]
        }
    }
}

Additional context

While COS isn't usually set to run on charmed k8s, and doesn't have many uses on AWS at the moment. COS is a good testing workload for juju and charmed k8s releases. Since traefik is used more and more as a standard ingress operator for Canonical it makes sense that it can work in most (all) environments.

@Abuelodelanada
Copy link
Contributor

Hello @asbalderson !

I have published a PR for this

Are you able to test it using latest/edge/fix378??

@jeffreychang911
Copy link

It works in SolQA AWS env.

Model Controller Cloud/Region Version SLA Timestamp
cos foundations-k8s kubernetes_cloud/us-east-1 3.5.2 unsupported 20:51:25Z

App Version Status Scale Charm Channel Rev Address Exposed Message
alertmanager 0.27.0 active 1 alertmanager-k8s latest/candidate 124 10.152.183.101 no
catalogue active 1 catalogue-k8s latest/candidate 59 10.152.183.208 no
grafana 9.5.3 active 1 grafana-k8s latest/candidate 117 10.152.183.219 no
loki 2.9.6 active 1 loki-k8s latest/candidate 158 10.152.183.220 no
prometheus 2.52.0 active 1 prometheus-k8s latest/candidate 209 10.152.183.49 no
traefik 2.11.0 active 1 traefik-k8s latest/edge/fix378 202 10.152.183.40 no Serving at a1fe861d3f36d4590b4b683d2424bc72-20394484.us-east-1.elb.amazonaws.com

Unit Workload Agent Address Ports Message
alertmanager/0* active idle 192.168.20.205
catalogue/0* active idle 192.168.59.140
grafana/0* active idle 192.168.59.143
loki/0* active idle 192.168.20.206
prometheus/0* active idle 192.168.59.144
traefik/0* active idle 192.168.20.207 Serving at a1fe861d3f36d4590b4b683d2424bc72-20394484.us-east-1.elb.amazonaws.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants