Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sidecar injection with CNI enabled on AKS - failed to setup network for sandbox plugin type="consul-cni" name="consul-cni" failed (add): error retrieving pod: Unauthorized #4442

Open
tspearconquest opened this issue Dec 3, 2024 · 0 comments
Labels
type/bug Something isn't working

Comments

@tspearconquest
Copy link

tspearconquest commented Dec 3, 2024

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

Consul 1.3.1 with CNI and Connect Injector installed; AKS Kubernetes 1.30.6; the CNI is unable to pull images from our private registry. Switching the CNI off, things work fine. Switching it on results in the below error for our application pods which get the sidecar injected:

Events:
  Type     Reason                  Age   From               Message
  ----     ------                  ----  ----               -------
  Warning  FailedCreatePodSandBox  11s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "85f01cd3e8e653e7ec7a371d0f010b4a5aacd386fe7ae3c4c95f12e69c68703f": plugin type="consul-cni" name="consul-cni" failed (add): error retrieving pod: Unauthorized

The same error can be seen whether we have our application deployment configured for sidecar injection or not. If I simply switch off the injection annotation, I still see the same error above even after redeploying with the change and new pods coming up (and just to confirm - no, the initContainer and container are not there when the injection annotation is turned off; that part is working properly, however the error remains the same). Only when I switch off the CNI altogether and run helm upgrade, then my application workload pods can start up (and if the deployment annotation is turned on, then injection is working properly). I want to use the CNI so that I can lower the privilege level of my application workload pods.

values.yaml

global:
  # Disable all services by default and only enable those we choose
  enabled: false

  # Name used for helm install
  name: consul
  datacenter: consul-hub-sbx-dev
  image: "consul-proxied-through-our-registry" ## dummy value for the github issue
  imageK8S: "consul-proxied-through-our-registry" ## dummy value for the github issue

  # The image used for the side cars
  imageConsulDataplane: "consul-proxied-through-our-registry" ## dummy value for the github issue

  # The repo pull secret in the cluster is managed by a service in the cluster which
  # reconciles the secrets in the cluster against secrets in Azure Keyvault managed
  # by the platform team. This secret name must already exist in the namespace.
  imagePullSecrets:
    - name: "our-registry-image-pull-secret"
  acls:
    manageSystemACLs: true
    bootstrapToken:
      secretName: consul-bootstrap-token
      secretKey: token
  tls:
    enabled: true
  enableConsulNamespaces: true
externalServers:
  enabled: true
  hosts: ["our-consul-hosts"]
  httpsPort: 443
  useSystemRoots: true
  k8sAuthMethodHost: "our-api-server-url"
server:
  enabled: false
connectInject:
  enabled: true
  # Enables the CNI DaemonSet which mitigates the need for every pod to have root and network admin privileges
  cni:
    enabled: true
    updateStrategy: |-
      rollingUpdate:
        maxUnavailable: 1
      type: RollingUpdate
  # When true, this forces consul for all pods in all namespaces that match the namespace selector below
  default: false
  disruptionBudget:
    enabled: true
    maxUnavailable: 2
    minAvailable: 2
  # Target all namespaces except for the ones named in the list below. When default, above, is false, this has no effect.
  # When default is false, all namespaces are eligible for injection, and only pods with the correct annotation will be injected.
  namespaceSelector: |-
    matchExpressions:
      - key: kubernetes.io/metadata.name
        operator: NotIn
        values:
          - "consul"
          - "default"
          - "external-secrets"
          - "falcon-kpa"
          - "falcon-sensor"
          - "flux-system"
          - "gatekeeper-system"
          - "kube-node-lease"
          - "kube-public"
          - "kube-system"
  replicas: 4
  affinity: |-
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              app: consul
              chart: consul-helm
              component: connect-injector
              release: consul-enterprise
          matchLabelKeys:
            - pod-template-hash
          topologyKey: kubernetes.io/hostname
  # Configuration settings for the Consul API Gateway integration.
  apiGateway:
    managedGatewayClass:
      # This value defines the type of Service created for gateways (e.g. LoadBalancer, ClusterIP)
      # serviceType: NodePort
      copyAnnotations:
        service:
           annotations: |-
              - service.beta.kubernetes.io/azure-dns-label-name
              - service.beta.kubernetes.io/azure-load-balancer-internal
              - service.beta.kubernetes.io/azure-load-balancer-internal-subnet
              - service.beta.kubernetes.io/azure-load-balancer-ipv4
              - service.beta.kubernetes.io/azure-load-balancer-resource-group
              - service.beta.kubernetes.io/azure-pls-create
              - service.beta.kubernetes.io/azure-pls-fqdns
              - service.beta.kubernetes.io/azure-pls-name
              - service.beta.kubernetes.io/azure-pls-resource-group
              - service.beta.kubernetes.io/azure-pls-visibility
      deployment:
         defaultInstances: 1
         maxInstances: 1
         minInstances: 1
      serviceType: LoadBalancer
ingressGateways:
  enabled: false

Reproduction Steps

We see this when we deploy our app. I can't share the app images or manifests in full but can answer questions. The application and sidecar images both come from our private registry, however they are stored in separate repos with different credentials.

Therefore, our application pods are defined with 2 secret names listed in imagePullSecrets, in order for the pod to try to pull using both kind: Secret type: kubernetes.io/dockerconfigjson resources and successfully pull the images. This works fine when CNI is disabled. Based on the errors coming when CNI is enabled but injection being disabled, and the fact that our application pods work properly with 2 image pull secrets when we have the CNI disabled, this seems to indicate that the consul-cni service account is missing something related to accessing the image pull secret.

Logs

Because this happens during pod sandbox creation by the kubelet, there are no container logs in our app pod which I can share; only the event output I provided above. The consul-cni pod logs on the same node where a failing pod is located have no output related to this failure nor any output at all during the timestamps when the image pull fails are happening, so there is nothing to indicate what is happening there either.

Expected behavior

I expected our app pods to come up properly with the sidecar having reduced privileges and no initContainer being present.

Environment details

We use Azure CNI plugin, the rest of the requested details are seen above.

Additional Context

@tspearconquest tspearconquest added the type/bug Something isn't working label Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant