Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove telegraf and use only fluent-bit for telemetry #1030

Draft
wants to merge 47 commits into
base: main
Choose a base branch
from

Conversation

gracewehner
Copy link
Contributor

@gracewehner gracewehner commented Dec 16, 2024

PR Description

  • Upgrade fluent-bit
    • Linux: 2.1.10 -> 3.2.2 (latest)
      • >=3.2 is necessary for using metrics_selector and labels processors for filtering Prometheus metrics
    • Windows: 2.1.10 -> 3.0.7 (latest)
      • 3.0 is the latest Windows version
  • Remove telegraf for Linux and Windows
  • Changes to fluent-bit
    • Built-in Plugins:
      • Use prometheus_scrape input plugin to scrape Prometheus metrics previously collected by telegraf
      • Use metrics_selector and labels processors to filter certain metrics and drop unnecessary labels before sending to App Insights
    • Conf:
      • Use new YAML format for config to be able to use metrics_selector and labels processors
    • Custom Output Plugin:
      1. Collect CPU and Memory usage for otelcollector and metricsextension that were previously collected by telegraf
        • This runs as a go routine, separate from the fluent-bit pipeline
        • Use the same underlying golang package as telegraf: github.com/shirou/gopsutil/v4/process
        • Collect at the same frequency as telegraf and aggregate to p50 and p95
        • Send extra env var as customDimensions as telegraf did
      2. Decode the Prometheus metrics msgpack from fluent-bit and send to App Insights in the format we want
    • Add one line to the fluent-bit proxy_plugin file so that the Prometheus metrics will be allowed to flow to our golang output plugin:
      • out->event_type = FLB_OUTPUT_LOGS | FLB_OUTPUT_METRICS;
      • fluent-bit has the proxy_plugin files to allow the golang output plugins to be built upon the C code. However, this does not specify what type the output plugin accepts (out of logs, metrics, and traces types), so it defaults to only allowing the logs type to be routed to the ouput plugin.
  • Build Pipeline:
    • Build fluent-bit with the line added above in the exact same way Mariner builds the package.
    • Only build fluent-bit with the plugins that we actually use so that our CVE surface area is very low.
  • Main image bug fixes:
    • Use daemonset config file for fluent-bit for the daemonset. Previously, it was using the replicaset config file for both replicaset and daemonset and the daemonset logs weren't being collected
    • Fix telemetry sent for network-observability and acstor that was missed

New Feature Checklist

image

Telemetry Values Comparison

  • ReplicaSet
image
  • DaemonSet
image

@gracewehner gracewehner requested a review from a team as a code owner December 16, 2024 20:04
@vishiy vishiy marked this pull request as draft December 16, 2024 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant