-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for comments: service.type and service.distro Resource attributes #554
Comments
@open-telemetry/collector-approvers @open-telemetry/collector-contrib-approvers I would also like your opinion on this proposal. |
While I see the appeal for In any case, I do believe that the collector has to provide the distribution as part of the resource attributes for its own telemetry. |
The main intent of semantic conventions is defining the shape of telemetry data, and I have some mixed feelings when I see them used in modelling parts of protocol specifications.
That's also something I'm wondering about: do we see value in putting those attributes on telemetry data? Or is the primary intent to satisfy requirements stemming from coupling the OpAmp spec with semantic conventions? |
@jpkrohling here's an example from @tigrannajaryan:
This is not to say I agree with this reasoning. Wouldn't all these different databases have a different I'm also worried that If this is needed for OpAMP (as this comment suggests: #396 (comment)), why not intruduce an OpAMP-specific attribute like |
Yes. In the same vein, Elasticsearch vs. OpenSearch. As a user, I would likely prefer to have each fork on its own service.type. Collector core vs. contrib are not forks, they really are different distributions. |
To be clear: the only reason OpAMP is concerned with this is to direct the Collector to report useful telemetry about itself and to be identified in a way that is useful for the end user. There is no other use for agent type in any other capabilities that OpAMP supports. |
I think there is value in putting this in telemetry. If I am running a cache that I give a service name "foo" it is likely useful for me to also know that that cache is actually a Redis instance. If we don't see the value of this then we should not add a |
Yes, that's the alternate if we don't believe that |
I was referring to the identification part, where semantic convention attributes seem to be part of the OpAMP spec. From a non-OpAMP point of view, I find
That's definitely a valid use case (for |
Yes, we are in agreement here, that's what I meant by "to be identified in a way that is useful for the end user".
This is probably somewhat unique to Otel Collector. If we look at Postgres for example there is a ton of Postgres-based databases, but I am not sure I would call them a "distro", perhaps they would be their own "service types".
Probably of quite a limited use for the first-party services. Something developed and used internally in most cases has a well known internal name that is recorded in |
Envoy proxy can also be extended and built with custom extensions, essentially being another "distro". For ex, the Istio project uses it's own distro of Envoy (https://github.com/istio/proxy) |
I see value in having a Speaking from the perspective of an observability vendor, I want to be able to identify the type of telemetry source and provide different observability experiences based on the data that type typically emits. For example, the signals that matter for a web service are not the same as for a kafka broker, or a postgres instance, or redis instance, etc etc. Currently, we have to rely on duck typing: the presence of certain attribute keys or values might suggest that the telemetry is coming from a particular type. This is brittle and often insufficient. I like the idea of FQDN |
@jaronoff97 @jlegoff @arminru @andykellr any opinions on this? |
I support the two proposed new attributes. In this related work, open-telemetry/oteps#238, I've tried to establish a uniqueness requirement for resource attributes of some kind. The solution proposed in this issue will help. We want to be sure that no two agents with identical attributes handle a pipeline, otherwise they appear to be double-counting the items that pass through them both. If we have semantic conventions that ensure pipelines have unique stage identities, we can do more to automate pipeline monitoring. |
My quick reaction to one comment is that I think the use case of OpAMP implies the use of Specifically, the OTEL collector being used as a an agent vs. gateway that seems more like a "role" and not something a FQDN would address. |
@jsuereth I would expect the role to be recorded in |
To make the OpAMP use case clear here, assume we have an OpAMP client that works on the collector and reports the collectors config under the effective_config section. It's then implied that an OpAMP server who is able to read and unmarshal this effective configuration will want to do so by using the collectors unmarshaling logic. This is relatively straightforward, a collector OpAMP client connects to a collector OpAMP server. As OpAMP is adopted by more agents, however, this is more complicated. Because OpAMP allows for a partial implementation, any agent type that implements the protocol can connect to our server, not just an OpAMP collector client. The next OpAMP client implementation is by the OpAMP bridge in the operator which reports on the Kubernetes configuration for a collector, this is decidedly different than our OpAMP collector client's effective configuration. In the future, say an agent like Prometheus were to have an OpAMP client, our OpAMP server would want to be able to marshal this configuration using Prometheus' unmarshal logic. All of this necessitates a type that an OpAMP server can switch on to know what type of client it's talking to. Without wading too much in to the discussion of this, I would posit that any collector will have the same structure of configuration, the difference being who built the image. To that end, I would recommend that any images that are functionally equivalent and share a configuration structure should have equivalent I'll tie it up in this example: sequenceDiagram
participant P as OpAMP-Client (Prometheus, vanilla)
participant OTC as OpAMP-Client (OTC, vanilla)
participant SOTC as OpAMP-Client (OTC, Splunk)
participant OB as OpAMP-Client (Bridge, vanilla)
participant OPS as OpAMP Server
P->>+OPS: sends effective config
OTC->>+OPS: sends effective config
SOTC->>+OPS: sends effective config
OB->>+OPS: sends effective config
OPS-->>-OTC: Responds OK
OPS-->>-SOTC: Responds OK
OPS-->>-P: Responds OK
OPS-->>-OB: Responds OK
We can see that each OpAMP client is able to report its |
@jaronoff97 I am not sure I understand why we need "role" in the use case you described. Wouldn't "role" be always "client" for all of the connecting agents in the diagram you show? |
yes – I don't think it's terribly useful for us outside of a reporting case where you could image the client and server emitting the same metric (connections for example) to an external telemetry vendor. Without the role, the vendor would receive
however with a role it would be more descriptive:
This example is admittedly contrived, and there's a better use case for collectors IMO. |
The http connection metrics already have their own, distinct way of differentiating the client and server metrics. They have different metric names, there is no need for an attribute: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/http/http-metrics.md |
still, i think role would be useful here for the opamp supervisor case. The supervisor acts as a middleman and without the role the telemetry would look like this: flowchart TD
A[OpAMP Extension] -->|http| B(OpAMP Supervisor)
B -->|http| C{OpAMP Server}
Without role, it's unclear the relationship between these related metrics. In this example we have high latency in the opamp server, but it's unclear if that would be affecting the opamp extension or the opamp supervisor. Adding in the role would allow the telemetry to understand the relationship between these metrics:
|
I think Supervisor should use a different |
I don't know what I'm talking about, but this looks to me like maybe an example of how metrics have less fidelity than traces. There are three resources in the diagram, but no mention of their identities in the metric attributes/labels.
I don't know what I'm talking about, but this sounds like OpAMP has a concept of what makes two things similar or different. Should the discussion here be about an attribute in an |
That is the alternate approach if we decide that |
Contributes to open-telemetry#554 Contributes to open-telemetry#396 Contributes to open-telemetry/opamp-spec#131 ## Problem Description `service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service) as the "Logical name of the service". The expectation is that `service.name` will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace). Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276) `service.name` by default to be the name of the executable (e.g. otelcorecol or otelcontribcol). Collector's `service.name` can be overridden by the operator using `service.telemetry.resource` setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names. However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees. This [issue](open-telemetry#396) talk a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention. This [issue](open-telemetry/opamp-spec#131) shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol. ## Proposed Change This is a request for comments for adding the following Recommended, experimental Resource semantic conventions: - `service.type` - an FQDN that uniquely identifies the type of the service, e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique. Note that having a separate `service.type` allows OpAMP if wanted by the operator to manage the same type of agents in a similar way even though their `service.name` values may be different due to different logical roles they have. Another example with NGINX: `service.type` will be set to com.nginx by NGINX developers, while `service.name` is set to "api-gateway" by the operator, denoting the logical role that the particular NGINX deployment serves in this particular system.
Contributes to open-telemetry#554 Contributes to open-telemetry#396 Contributes to open-telemetry/opamp-spec#131 `service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service) as the "Logical name of the service". The expectation is that `service.name` will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace). Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276) `service.name` by default to be the name of the executable (e.g. otelcorecol or otelcontribcol). Collector's `service.name` can be overridden by the operator using `service.telemetry.resource` setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names. However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees. This [issue](open-telemetry#396) talk a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention. This [issue](open-telemetry/opamp-spec#131) shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol. This change adds `service.type` as Recommended, experimental Resource semantic convention. The value is a string in reverse domain notation that uniquely identifies the type of the service, e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique. Note that having a separate `service.type` allows OpAMP if wanted by the operator to manage the same type of agents in a similar way even though their `service.name` values may be different due to different logical roles they have. Another example with NGINX: `service.type` will be set to com.nginx by NGINX developers, while `service.name` is set to "api-gateway" by the operator, denoting the logical role that the particular NGINX deployment serves in this particular system.
Contributes to open-telemetry#554 Contributes to open-telemetry#396 Contributes to open-telemetry/opamp-spec#131 Problem Description =================== `service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service) as the "Logical name of the service". The expectation is that `service.name` will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace). Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276) `service.name` by default to be the name of the executable (e.g. otelcorecol or otelcontribcol). Collector's `service.name` can be overridden by the operator using `service.telemetry.resource` setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names. However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees. This [issue](open-telemetry#396) talk a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention. This [issue](open-telemetry/opamp-spec#131) shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol. Proposed Change =============== This change adds `service.type` as Recommended, experimental Resource semantic convention. The value is a string in reverse domain notation that uniquely identifies the type of the service, e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique. Note that having a separate `service.type` allows OpAMP if wanted by the operator to manage the same type of agents in a similar way even though their `service.name` values may be different due to different logical roles they have. Another example with NGINX: `service.type` will be set to com.nginx by NGINX developers, while `service.name` is set to "api-gateway" by the operator, denoting the logical role that the particular NGINX deployment serves in this particular system.
Contributes to open-telemetry#554 Contributes to open-telemetry#396 Contributes to open-telemetry/opamp-spec#131 Problem Description =================== `service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service) as the "Logical name of the service". The expectation is that `service.name` will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace`). Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276) `service.name` by default to be the name of the executable (e.g. otelcorecol or otelcontribcol). Collector's `service.name` can be overridden by the operator using `service.telemetry.resource` setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names. However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees. This [issue](open-telemetry#396) talks a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention. This [issue](open-telemetry/opamp-spec#131) shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol. Changes ======= This change adds `service.type` as a Recommended, experimental Resource semantic convention. The value is a string in reverse domain notation that uniquely identifies the type of the service (the type of the product deployed as the service), e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique. For OpAMP having a separate `service.type` allows OpAMP, if desired by the operator, to manage the same type of agents in a similar way even though their `service.name` values may be different due to different logical roles they have. An example unrelated to OpAMP, when using NGINX: `service.type` will be set to "com.nginx", while `service.name` is set to "api-gateway", denoting the logical role that the particular NGINX deployment serves in this particular system.
Contributes to open-telemetry#554 Contributes to open-telemetry#396 Contributes to open-telemetry/opamp-spec#131 Problem Description =================== `service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service) as the "Logical name of the service". The expectation is that `service.name` will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace`). Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276) `service.name` by default to be the name of the executable (e.g. otelcorecol or otelcontribcol). Collector's `service.name` can be overridden by the operator using `service.telemetry.resource` setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names. However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees. This [issue](open-telemetry#396) talks a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention. This [issue](open-telemetry/opamp-spec#131) shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol. Changes ======= This change adds `service.type` as a Recommended, experimental Resource semantic convention. The value is a string in reverse domain notation that uniquely identifies the type of the service (the type of the product deployed as the service), e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique. For OpAMP having a separate `service.type` allows OpAMP, if desired by the operator, to manage the same type of agents in a similar way even though their `service.name` values may be different due to different logical roles they have. An example unrelated to OpAMP, when using NGINX: `service.type` will be set to "com.nginx", while `service.name` is set to "api-gateway", denoting the logical role that the particular NGINX deployment serves in this particular system.
PR for |
Contributes to open-telemetry#554 Contributes to open-telemetry#396 Contributes to open-telemetry/opamp-spec#131 Problem Description =================== `service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service) as the "Logical name of the service". The expectation is that `service.name` will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace`). Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276) `service.name` by default to be the name of the executable (e.g. otelcorecol or otelcontribcol). Collector's `service.name` can be overridden by the operator using `service.telemetry.resource` setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names. However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees. This [issue](open-telemetry#396) talks a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention. This [issue](open-telemetry/opamp-spec#131) shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol. Changes ======= This change adds `service.type` as a Recommended, experimental Resource semantic convention. The value is a string in reverse domain notation that uniquely identifies the type of the service (the type of the product deployed as the service), e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique. For OpAMP having a separate `service.type` allows OpAMP, if desired by the operator, to manage the same type of agents in a similar way even though their `service.name` values may be different due to different logical roles they have. An example unrelated to OpAMP, when using NGINX: `service.type` will be set to "com.nginx", while `service.name` is set to "api-gateway", denoting the logical role that the particular NGINX deployment serves in this particular system.
All, the PR that adds |
Closing this as "won't do". We will likely instead propose |
Problem Description
service.name
Resource attribute is currently defined as the "Logical name of the service". The expectation is thatservice.name
will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace).Otel Collector sets
service.name
by default to be the name of the executable (e.g. otelcorecol or otelcontribcol).Collector's
service.name
can be overridden by the operator usingservice.telemetry.resource
setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names.However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees.
This issue talk a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention.
This issue shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol.
Proposal
This is a request for comments for adding the following Recommended, experimental Resource semantic conventions:
service.type
- an FQDN that uniquely identifies the type of the service, e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique.service.distro
- an FQDN that uniquely identifies the distribution of the service. A number of distributions can belong to the sameservice.type
. For example OpenTelemetry Collector has multiple known distributions, e.g. io.opentelemetry.collector-contrib, com.splunk.opentelemetry-collector, com.amazon.adot-collector, etc. [as an alternate instead of reverse company FQDN we may also requireservice.distro
to point to the source code repo of the distro (e.g. git/github repo) for OSS agents].Note that having a separate
service.type
allows OpAMP if wanted by the operator to manage the same type of agents in a similar way even though theirservice.name
values may be different due to different logical roles they have.Another example with NGINX:
service.type
will be set to com.nginx by NGINX developers, whileservice.name
is set to "api-gateway" by the operator, denoting the logical role that the particular NGINX deployment serves in this particular system.I am looking for comments and thoughts on this proposal before I submit a PR.
The text was updated successfully, but these errors were encountered: