Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add ECS mode for Helm deployment #33

Merged
merged 14 commits into from
Jul 26, 2024

Conversation

rogercoll
Copy link
Collaborator

@rogercoll rogercoll commented Jul 19, 2024

Changes

  • Adds the hostmetrics and filelog receiver and its corresponding processors to properly decorate the metrics/logs. (internal manifest has been used as reference)
  • Adds the elasticsearch exporter and associated secrets.
  • SDK logs only forwarded to Elastic using the elasticsearch exporter.
  • Modes:
    • Deployment for the OpenTelemetry Demo + Otlp receiver
    • Deaemonset for the hostmetrics + filelog receiver

Merge Requirements

For new features contributions please make sure you have completed the following
essential items:

* [ ] CHANGELOG.md updated to document new feature additions
* [ ] Appropriate documentation updates in the docs
* [ ] Appropriate Helm chart updates in the helm-charts

Maintainers will not merge until the above have been completed. If you're unsure
which docs need to be changed ping the
@open-telemetry/demo-approvers.

@rogercoll rogercoll requested a review from ChrsMark July 19, 2024 13:02
@@ -1,56 +0,0 @@
mode: daemonset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that we only use the Deployment?
If so, will that work with a multi-node cluster?
The purpose of the Daemonset is to ensure that node level metrcis+logs are collected from each node by the respective Collector.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! I assumed we were already deploying it as a daemonset. I think that as we are enabling the hostmetrics and filelog receiver the default mode should be daemonset and remove the deployment one.

Mode changed in 80612a5

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then we will be collecting k8scluster metrics and events from all the nodes, right? Unless we want to remove them for some reason?

In general, the Collector is recommended to be deployed in both modes at the same time and each Resource handels its own part. The Daemonset takes care of the node level collection and the Deployment acts a singleton collecting the cluster level signals (metrics but traces from the SDKs as well).
Or we don't want to achieve this here?

Copy link
Collaborator Author

@rogercoll rogercoll Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then we will be collecting k8scluster metrics and events from all the nodes, right? Unless we want to remove them for some reason?

k8scluster metrics are disabled when deployed as a deamonset: https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-collector#configuration-for-kubernetes-cluster-metrics

In general, the Collector is recommended to be deployed in both modes at the same time and each Resource handels its own part.

Makes sense to me, thanks for the clarification. If it sounds good to you, I will rollback the removal of the daemonset file and document the following two deployment modes:

  • Deployment: Otlp receiver + k8scluster metrics + Otlp exporter
  • Daemonset: Hostmetrics + filelogreceiver (and its processors) + elasticsearch exporter

My main concern would be with the logs, the demo services send Otlp logs that can be either gathered by the Otlp receiver or the filelog receiver. Should we exclude those from being collected with the filelog receiver to prevent duplications?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deployment: Otlp receiver + k8scluster metrics + Otlp exporter
Daemonset: Hostmetrics + filelogreceiver (and its processors) + elasticsearch exporter

That sounds good! For the Daemonset all the receivers that make sense on Node scope and the Deployment everything that is cluster scope (k8scluster metrics, k8s events etc).

For the logs duplication one thing we can do is to apply the filter processor to drop logs collected by the filelog receiver in case some k8s labels apply? Or based on a similar pattern, we can annotate the instrumented apps with a specific annotation to indicate that these should be dropped.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the logs duplication one thing we can do is to apply the filter processor to drop logs collected by the filelog receiver in case some k8s labels apply? Or based on a similar pattern, we can annotate the instrumented apps with a specific annotation to indicate that these should be dropped.

The issue is that not all the Otel demo services have been migrated to Otel logs, instead some output to stdout and smothers to an Otel logs provider. See tracking issue: open-telemetry#1472.

This is an important topic to tackle, but I think it would make sense to move the discussion and implementation to its own issue/pr, wdyt?

Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

k8s_api_config:
auth_type: serviceAccount
metrics:
k8s.pod.cpu.node.utilization:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can also enable the k8s.pod.memory.node.utilization

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Saw it now, sure I will create a follow-up PR. Thanks!

@rogercoll rogercoll merged commit f93eabe into elastic:main Jul 26, 2024
22 of 23 checks passed
@rogercoll rogercoll deleted the es_exporter_k8s branch July 26, 2024 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants