Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot deploy OpenTelemetryCollector to operator #3452

Open
frmrm opened this issue Nov 12, 2024 · 6 comments
Open

Cannot deploy OpenTelemetryCollector to operator #3452

frmrm opened this issue Nov 12, 2024 · 6 comments
Labels
bug Something isn't working needs triage

Comments

@frmrm
Copy link

frmrm commented Nov 12, 2024

Component(s)

No response

What happened?

Description

We are attempting to upgrade our Open Telemetry Connector deployment from 0.92 to 0.113 and are having issues getting an Open Telemetry Collector that works fine in the older version to work in the newer version of the operator. The error message that comes back is quite opaque and so far looking through the source hasn't yielded much insight for me in terms of what's going on.

I did update the collector definition slightly to be compatible with the v1beta1 api syntax, but otherwise left it untouched from the version that deploys just fine in 0.92.

Steps to Reproduce

  1. Deploy otel 0.113
  2. Attempt to deploy the collector below:
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: cluster
  namespace: otel
spec:
  config:
    receivers:
      otlp:
        protocols:
          grpc:
          http:
      zipkin:
      jaeger:
        protocols:
          grpc:
          thrift_binary:
          thrift_compact:
          thrift_http:
    processors:
      filter/spans:
        spans:
          exclude:
            match_type: regexp
            services:
              - "[redacted]"
    exporters:
      logging:
      otlp/tempo:
        endpoint: "[redacted]"
        tls:
          insecure_skip_verify: true
    service:
      telemetry:
        logs:
          level: "debug"
      pipelines:
        traces/internal:
          receivers: [otlp, zipkin, jaeger]
          processors: []
          exporters: [logging]
        traces/tempo:
          receivers: [otlp, zipkin, jaeger]
          processors: []
          exporters: [otlp/tempo]

Expected Result

It's expected that this would work because it works in 0.92.

Actual Result

We see the following error while attempting to push the collector:

admission webhook "mopentelemetrycollectorbeta.kb.io" denied the request: src and dst must not be nil

Kubernetes Version

1.30.5

Operator version

0.113

Collector version

0.113

Environment information

Environment

Deployed from Helm charts.

Log output

2024/11/12 19:43:39 http: TLS handshake error from X.X.8.21:59048: EOF
2024/11/12 19:40:00 http: TLS handshake error from X.X.8.21:47244: EOF
2024/11/12 19:30:00 http: TLS handshake error from X.X.8.21:58650: EOF
2024/11/12 19:30:00 http: TLS handshake error from X.X.8.21:58648: EOF
{"level":"ERROR","timestamp":"2024-11-12T19:19:55Z","message":"Reconciler error","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","OpenTelemetryCollector":{"name":"cluster","namespace":"otel"},"namespace":"otel","name":"cluster","reconcileID":"9f59c733-a65c-4934-a42b-be87f4a26896","error":"admission webhook \"mopentelemetrycollectorbeta.kb.io\" denied the request: src and dst must not be nil","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224"}
2024/11/12 19:10:00 http: TLS handshake error from X.X.12.16:40416: EOF
{"level":"ERROR","timestamp":"2024-11-12T19:03:15Z","message":"Reconciler error","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","OpenTelemetryCollector":{"name":"cluster","namespace":"otel"},"namespace":"otel","name":"cluster","reconcileID":"8be80529-01a7-4228-ac07-ac05514b63c6","error":"admission webhook \"mopentelemetrycollectorbeta.kb.io\" denied the request: src and dst must not be nil","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224"}
2024/11/12 19:00:00 http: TLS handshake error from X.X.5.10:36190: EOF
2024/11/12 19:00:00 http: TLS handshake error from X.X.8.18:33402: EOF
2024/11/12 18:50:00 http: TLS handshake error from X.X.11.8:36554: EOF
2024/11/12 18:50:00 http: TLS handshake error from X.X.13.21:32836: EOF

Additional context

No response

@frmrm frmrm added bug Something isn't working needs triage labels Nov 12, 2024
@swiatekm
Copy link
Contributor

@iblancasa could this be related to #3281?

@iblancasa
Copy link
Contributor

I don't think so. @frmrm is this happening with all your collectors or just with a set of them?

@frmrm
Copy link
Author

frmrm commented Nov 14, 2024

is this happening with all your collectors or just with a set of them?

This is the only collector we deploy directly with a CRD. The rest are injected as sidecars, ship to this collector, and this collector "fans out" to some different services / does some filtering before things leave the cluster. Because we weren't able to deploy this we had to roll back, which got us working again so it's certainly something that was introduced in one of the recent versions. (Or something unique to upgrading between them, I'm unsure.)

@toporek3112
Copy link

I just encountered the same issue with the error message when I was deploying version 0.113.0

admission webhook "mopentelemetrycollectorbeta.kb.io" denied the request: src and dst must not be nil

So (just as a test) I rolled back to 0.108.0 seems like something broke in between. Did anything change in the CRD?

@iblancasa
Copy link
Contributor

Can you share the manifest?

@swiatekm
Copy link
Contributor

At least for the manifest in the issue, I can easily reproduce the problem. It looks like the cause is the zipkin receiver definition, and the operator doesn't like that the receiver body is empty. Changing that to zipkin: {} fixes the problem. It doesn't complain about the same thing with the logging exporter, so it does sound at least vaguely related to #3281.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage
Projects
None yet
Development

No branches or pull requests

4 participants