Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU utlization on windows server after configuring the otelcol #603

Open
desududev opened this issue Jul 18, 2024 · 2 comments
Open
Labels
bug Something isn't working

Comments

@desududev
Copy link

High CPU utlization on windows server after configuring the otelcol

@dosubot dosubot bot added the bug Something isn't working label Jul 18, 2024
Copy link

dosubot bot commented Jul 18, 2024

Hey there, @desududev! I'm a bot designed to assist you with bugs, questions, and contributions. Feel free to ask for help while waiting for a human maintainer to join the conversation. 🤖

To troubleshoot and resolve high CPU utilization on a Windows server after configuring the OpenTelemetry Collector (otelcol), you can follow these steps:

  1. Enable Internal Telemetry: Configure the Collector’s internal telemetry to monitor its performance. This will help you understand the resource usage and identify any bottlenecks.

  2. Use the pprof Extension: Enable the pprof extension to profile the Collector as it runs. This can help you identify which parts of the Collector are consuming the most CPU.

    extensions:
      pprof:
        endpoint: 0.0.0.0:1777
  3. Check for Configuration Issues: Ensure that the Collector is properly configured. Misconfigurations can lead to high CPU usage. For example, make sure that the memory_limiter processor is configured correctly to avoid memory pressure.

    processors:
      memory_limiter:
        check_interval: 1s
        limit_mib: 4000
        spike_limit_mib: 500
        ballast_size_mib: 1000
  4. Inspect Logs and zPages: Check the Collector’s logs and use the zPages extension to inspect live data from the Collector’s receivers and exporters. This can help you identify any errors or performance issues.

    extensions:
      zpages:
        endpoint: 0.0.0.0:55679
  5. Use the debug Exporter: Configure the debug exporter to confirm that the Collector is receiving, processing, and exporting data correctly. This can help you identify any issues in the data flow that might be causing high CPU usage.

    receivers:
      zipkin:
    exporters:
      debug:
    service:
      pipelines:
        traces:
          receivers: [zipkin]
          processors: []
          exporters: [debug]

    To begin testing, generate a Zipkin payload. For example, you can create a file called trace.json that contains:

    [
      {
        "traceId": "5982fe77008310cc80f1da5e10147519",
        "parentId": "90394f6bcffb5d13",
        "id": "67fae42571535f60",
        "kind": "SERVER",
        "name": "/m/n/2.6.1",
        "timestamp": 1516781775726000,
        "duration": 26000,
        "localEndpoint": {
          "serviceName": "api"
        },
        "remoteEndpoint": {
          "serviceName": "apip"
        },
        "tags": {
          "data.http_response_code": "201"
        }
      }
    ]

    With the Collector running, send this payload to the Collector:

    curl -X POST localhost:9411/api/v2/spans -H 'Content-Type: application/json' -d @trace.json

    You should see a log entry like the following:

    2023-09-07T09:57:43.468-0700    info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 2}

    You can also configure the debug exporter so the entire payload is printed:

    exporters:
      debug:
        verbosity: detailed

    If you re-run the previous test with the modified configuration, the log output looks like this:

    2023-09-07T09:57:12.820-0700    info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 2}
    2023-09-07T09:57:12.821-0700    info    ResourceSpans #0
    Resource SchemaURL: https://opentelemetry.io/schemas/1.4.0
    Resource attributes:
         -> service.name: Str(telemetrygen)
    ScopeSpans #0
    ScopeSpans SchemaURL:
    InstrumentationScope telemetrygen
    Span #0
        Trace ID       : 0c636f29e29816ea76e6a5b8cd6601cf
        Parent ID      : 1a08eba9395c5243
        ID             : 10cebe4b63d47cae
        Name           : okey-dokey
        Kind           : Internal
        Start time     : 2023-09-07 16:57:12.045933 +0000 UTC
        End time       : 2023-09-07 16:57:12.046058 +0000 UTC
        Status code    : Unset
        Status message :
    Attributes:
         -> span.kind: Str(server)
         -> net.peer.ip: Str(1.2.3.4)
         -> peer.service: Str(telemetrygen)

By following these steps, you should be able to identify and resolve the cause of high CPU utilization on your Windows server after configuring the OpenTelemetry Collector [1][2].

To continue talking to Dosu, mention @dosu.

@jpkrohling
Copy link
Member

Good bot, especially the first three steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants