Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Deploying Otel Operator and Instrumentation CR instance via Helm #3014

Open
ZiedChekir opened this issue Jun 5, 2024 · 4 comments · May be fixed by #3074
Open

Issue with Deploying Otel Operator and Instrumentation CR instance via Helm #3014

ZiedChekir opened this issue Jun 5, 2024 · 4 comments · May be fixed by #3074
Labels
area:auto-instrumentation Issues for auto-instrumentation bug Something isn't working

Comments

@ZiedChekir
Copy link

ZiedChekir commented Jun 5, 2024

Component(s)

Instrumentation CR

What happened?

Description

When deploying the otel operator and Instrumentation CR instance via a custom Helm chart, there is a labeling conflict. The CR is labeled as app.kubernetes.io/managed-by=Helm, while according to the OpenTelemetry documentation, it needs to be managed by the OpenTelemetry operator with the label app.kubernetes.io/managed-by=opentelemetry-operator. This conflict causes the auto-instrumentation process to fail as the operator cannot add the necessary annotations to the Instrumentation CR, which are necessary for injecting the correct image into the auto-instrumentation initContainer.

Steps to Reproduce

Deploy the Otel operator and Instrumentation CR using a custom Helm chart.

Expected Result

The Otel operator should manage the Instrumentation CR, adding the necessary annotations, and successfully inject the appropriate image into the auto-instrumentation initContainer.

Actual Result

I get an error in the deployment resource saying spec.initcontainers[0].image is a required value which means the application looked at the instrumenation CR instance but did not find the autoinstrumenation images in the annotations since it is managed by Helm

Kubernetes Version

1.29

Operator version

0.98.0

Collector version

0.98.0

Environment information

No response

Log output

No response

Additional context

No response

@ZiedChekir ZiedChekir added bug Something isn't working needs triage labels Jun 5, 2024
@jaronoff97 jaronoff97 added area:auto-instrumentation Issues for auto-instrumentation and removed needs triage labels Jun 5, 2024
@jaronoff97
Copy link
Contributor

Discussions here

@zied-chekir
Copy link

I conducted a small diagnostic for this issue and hope this clarifies the problem.
This is the intrumentation details:

Name:         otel
Namespace:    default
Labels:       app.kubernetes.io/managed-by=Helm     
              app.kubernetes.io/version=0.56.0      
              helm.sh/chart=opentelemetry-0.3.5     
Annotations:  meta.helm.sh/release-name: my-release
              meta.helm.sh/release-namespace: default  
API Version:  opentelemetry.io/v1alpha1
Kind:         Instrumentation
Metadata:
  Creation Timestamp:  2024-06-06T12:58:47Z
  Generation:          1
  Resource Version:    515297
Spec:
  Dotnet:
    Env:
      Name:   OTEL_DOTNET_AUTO_LOGS_CONSOLE_EXPORTER_ENABLED
      Value:  false
      Name:   OTEL_DOTNET_AUTO_METRICS_CONSOLE_EXPORTER_ENABLED
      Value:  false
  Exporter:
    Endpoint:  http://otel:4318
  Java:
    Env:
      Name:   OTEL_JAVA_AUTO_LOGS_CONSOLE_EXPORTER_ENABLED
      Value:  false
      Name:   OTEL_JAVA_AUTO_METRICS_CONSOLE_EXPORTER_ENABLED
      Value:  false
  Propagators:
    tracecontext
    baggage
    b3
  Resource:
    addK8sUIDAttributes:  true
    Resource Attributes:
  Sampler:
    Argument:  1
    Type:      parentbased_traceidratio
Events:        <none>

This is from the otel documentation

Name:         python-instrumentation
Namespace:    application
Labels:       app.kubernetes.io/managed-by=opentelemetry-operator
Annotations:  instrumentation.opentelemetry.io/default-auto-instrumentation-apache-httpd-image:
               ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-apache-httpd:1.0.3
             instrumentation.opentelemetry.io/default-auto-instrumentation-dotnet-image:
               ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-dotnet:0.7.0
             instrumentation.opentelemetry.io/default-auto-instrumentation-go-image:
               ghcr.io/open-telemetry/opentelemetry-go-instrumentation/autoinstrumentation-go:v0.2.1-alpha
             instrumentation.opentelemetry.io/default-auto-instrumentation-java-image:
               ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:1.26.0
             instrumentation.opentelemetry.io/default-auto-instrumentation-nodejs-image:
               ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.40.0
             instrumentation.opentelemetry.io/default-auto-instrumentation-python-image:
               ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.39b0
API Version:  opentelemetry.io/v1alpha1
Kind:         Instrumentation
Metadata:
 Creation Timestamp:  2023-07-28T03:42:12Z
 Generation:          1
 Resource Version:    3385
 UID:                 646661d5-a8fc-4b64-80b7-8587c9865f53
Spec:
...
 Exporter:
   Endpoint:  http://demo-collector.opentelemetry.svc.cluster.local:4318
...
 Propagators:
   tracecontext
   baggage
 Python:
   Image:  ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.39b0
   Resource Requirements:
     Limits:
       Cpu:     500m
       Memory:  32Mi
     Requests:
       Cpu:     50m
       Memory:  32Mi
 Resource:
 Sampler:
Events:  <none>

When managed by the operator, the Instrumentation CR gets additional annotations that ensure everything works correctly.
It's also worth noting that when the Instrumentation CR is deployed by Helm, manually changing the 'managed-by' field to 'opentelemetry-operator' makes the instrumentation work again.

Below is the error I get when the instrumenation does not work, meaning the initContainers do not find the image for auto-instrumentation.
initcontainer error

The problem only occurs when I specify the auto-instrumentation images in the operator configuration as follows:

    manager:
      image:
        repository: ghcr.io/open-telemetry/opentelemetry-operator/opentelemetry-operator
        tag: "0.98.0"
      collectorImage:
        repository: "otel/opentelemetry-collector-contrib"
        tag: "0.98.0"
      autoInstrumentationImage:
        dotnet:
          repository: "ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-dotnet"
          tag: "1.2.0"
        java:
          repository: "ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java"
          tag: "2.3.0"

But if I specify the auto-instrumentation images in the instrumentation CR configuration like the following then it works fine

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: my-instrumentation
spec:
  java:
    image: your-customized-auto-instrumentation-image:java
  nodejs:
    image: your-customized-auto-instrumentation-image:nodejs
  python:
    image: your-customized-auto-instrumentation-image:python
  dotnet:
    image: your-customized-auto-instrumentation-image:dotnet
  go:
    image: your-customized-auto-instrumentation-image:go
  apacheHttpd:
    image: your-customized-auto-instrumentation-image:apache-httpd
  nginx:
    image: your-customized-auto-instrumentation-image:nginx

I hope I explained it correctly. Defining the images in the Otel operator configuration might be misleading, as unfortunately, the instrumentation behaviour will not work if deployed with Helm.

@jaronoff97
Copy link
Contributor

Thanks for the added context!! i think this is all because of how we upgrade things because this logic only runs when that is set. We filter based on this label for upgrades for the collector and instrumentations. This means that despite setting the right images on the operator, we never set those correctly on the instrumentation.

@jaronoff97
Copy link
Contributor

this will have to be breaking change, but I think that's acceptable given the current state isn't great. cc @open-telemetry/operator-approvers thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:auto-instrumentation Issues for auto-instrumentation bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants