-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed collector configuration #1906
Comments
Based on our products, I feel this would be a much needed feature. Our setup has 30 kubernetes clusters as of today with more than 4000 nodes and 70K pods.
Our teams have growing needs of forwarding logs to their own destination for analysis and reporting and filtering out logs. They need to frequently add/remove the destinations from the pipeline and therefore dynamic configuration is really required to enable it at large scale. |
This feature would be very advantageous to us. As we grow as a company, it is our desire to move away from a central team needing to know about the many hundreds of other services running on our clusters. Each team that writes a service is responsible for deploying their service and exposing any custom metrics or logs they want to pull off-cluster. We want a central team to manage the pipeline of how those metrics and logs get pushed to our central observability platform, but we do not want the owner of that pipeline to have to know about which endpoint or which logs or which metrics should be forwarded off cluster and which should not.. or what services exist in the first place. As stated in the initial problem statement of this issue, this is very similar to how the prometheus operator works today, and in fact that is what we use today. In order to move to an OTEL based solution and replace prometheus as a forwarding agent, we really require this decentralization ability. |
Thanks everyone for your feedback here. I've come around to this idea and think it would be beneficial to the community @swiatekm-sumo i'm going to self assign and work on this after #1876 is complete. Do you want to collaborate on the design? |
I totally support this initiative and agree with use-cases already mentioned above. Another use-case that I'd like to add is ability for developers to manage Tail Sampling configuration. We run hundreds of applications in cluster, with all observability data collected into the centralized platform. We want application developers to be able to configure Tail Sampling policies for their applications without touching |
@lsolovey Could you give an example what way of configuration you would expect? Since I am working on a proposal. |
In summary, a good first step would be to separate the configuration of exporters from the collector configuration. graph TD;
OpenTelemetryKafkaExporter-->OpenTelemetryExporter;
OpenTelemetryOtlpExporter-->OpenTelemetryExporter;
OpenTelemetryExporter-->OpenTelemetryGateway;
OpenTelemetryExporter-->OpenTelemetryAgent;
OpenTelemetryAgent-->OpenTelemetryCollector;
OpenTelemetryGateway-->OpenTelemetryCollector;
Since all these CRDs are based on the OpenTelemetryCollector definition, it seems to me a requirement to support a native yaml configuration. Once this is done, we can start prototyping the gateway and exporter CRD. |
My attempts so far at setting up and configuring OTel Collector Operator have lead me to somewhat similar thoughts mentioned here and #1477. The Prometheus Operator has the correct idea here, I believe. There are basically two or three concerns here that would be useful to separate:
|
I would like to restart this thread with a very simple proposal. The foundation for distributed collector configuration is the config merging feature of the collector. However, merging overrides arrays - proposal for append merging flag open-telemetry/opentelemetry-collector#8754. Merge of configuration is order dependent (e.g. the order of processors in the pipeline matters). Therefore the proposal is to introduce a new CRD apiVersion: opentelemetry.io/v1beta1
kind: CollectorGroup
metadata:
name: simplest
spec:
root: platform-collector
collectors:
- name: receivers
- name: pii-remove-users
- name: pii-remove-credit-cards
- name: export-to-vendor
---
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
name: platform-collector
spec:
collectorGroup: true
config:
The operator could do some validation of the collector configs to make sure each config contains only unique components to avoid overrides. |
I like the idea, but Ive a few open questions / thoughts:
|
Currently, configuration for a single Collector CR is monolithic. I'd like to explore the idea of allowing it to be defined in a distributed way, possibly by different users. It would be the operator's job to collect and assemble the disparate configuration CRs and create an equivalent collector configuration - much like how prometheus-operator creates a Prometheus configuration based on ServiceMonitors.
Prior art for similar solutions are prometheus operator with its Monitor CRs, or logging-operator.
Broadly speaking, the benefits of doing this could be:
Application developers could only write some piece of the configuration for their application, whereas a platform team would be responsible for running the collector.
Potential problems doing this that are unique to the otel operator:
Somewhat related issues regarding new CRs for collector configuration: #1477
I'd like to request that anyone who would be interested in this kind of feature, post a comment in this issue describing their use case.
The text was updated successfully, but these errors were encountered: