From 72ce08e82411196c73a63eaca54b1e0c9943fad8 Mon Sep 17 00:00:00 2001 From: Arve Knudsen Date: Wed, 10 Apr 2024 18:32:27 +0200 Subject: [PATCH 01/16] Add proposal: Simplify joining with info metrics Signed-off-by: Arve Knudsen --- ...ative-support-for-info-metrics-metadata.md | 78 +++++++++++++++++++ 1 file changed, 78 insertions(+) create mode 100644 proposals/2024-04-10-native-support-for-info-metrics-metadata.md diff --git a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md new file mode 100644 index 0000000..1ab9363 --- /dev/null +++ b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md @@ -0,0 +1,78 @@ +## Native Support for Info Metrics Metadata + +* **Owners:** + * Arve Knudsen [@aknuds1](https://github.com/aknuds1) [arve.knudsen@grafana.com](mailto:arve.knudsen@grafana.com) + +* **Implementation Status:** Partially implemented + +* **Related Issues and PRs:** + * [WIP: Info PromQL function prototype](https://github.com/grafana/mimir-prometheus/pull/598) + +* **Other docs or links:** + * [Proper support for OTEL resource attributes](https://docs.google.com/document/d/1FgHxOzCQ1Rom-PjHXsgujK8x5Xx3GTiwyG__U3Gd9Tw/edit#heading=h.unv3m5m27vuc) + * [Special treatment of info metrics in Prometheus](https://docs.google.com/document/d/1ebhGNLs3uhdeprJCullM-ywA9iMRDg_mmnuFAQCloqY/edit#heading=h.2rmzk7oo6tu8) + +> This proposal collects the requirements and implementation proposals for enhancing Prometheus with native support for info metrics metadata. + +## Why + +Currently Prometheus "forgets" which are the identifying labels of info metrics upon ingestion, even though this information is present in at least the OpenMetrics protobuf exposition format (the OpenMetrics text exposition format unfortunately lacks this capability). +The fact that Prometheus lacks a notion of which are info metrics' identifying labels leads to certain problems: + +* Explicit knowledge of each info metric's identifying labels must be embedded in join queries for when you wish to enrich queries with data (non-identifying) labels from info metrics. +* Complex join queries must be written in order to enrich time series with corresponding labels from info metrics. + This is particularly problematic in the OpenTelemetry (AKA OTel) context, since users depend on (joining with) the `target_info` info metric in order to add relevant [resource attributes](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/sdk.md) back to their Prometheus metrics. +* If an info metric's data (non-identifying) labels change (a situation that should become more frequent with OTel in the future, as the model will probably start allowing for non-identifying resource attribute mutations), join queries against the info metric (e.g. `target_info`) will temporarily fail due to resolving the join keys to two different metrics, until the old metric is marked stale (by default after five minutes). + +Especially in order to provide the best possible OTel experience, the info metric (`target_info` in the case of OTel) staleness problem needs to be solved, so users won't experience temporarily failing join queries while trying to include OTel resource attributes. +Also, it would be much better if we could provide a simpler query experience where the user doesn't have to know how to write PromQL joins (a fairly complex matter), in order to include e.g. OTel resource attributes. +Another possible positive outcome might be dedicated support in the Grafana UI for visualizing the resource attributes of each OTel metric. + +### Pitfalls of the current solution + +Prometheus currently persists info metrics as if they were normal float samples. +This means that knowledge of info metrics' identifying labels are lost, and you have to base yourself on convention when querying on them (for example that `target_info` should have `job` and `instance` as identifying labels). +The consequence is that you need relatively expert level PromQL knowledge to include info metric labels in your query results; as OTel grows in popularity, this becomes more and more of a problem as users will want to include certain labels from `target_info` (corresponding to OTel resource attributes). +Without persisted info metric metadata, one can't build more user friendly abstractions (e.g. a PromQL function) for including OTel resource attributes (or other info metric labels) in query results. Neither can you build dedicated UI for OTel resource attributes (or other info metric labels). + +## Goals + +Goals and use cases for the solution as proposed in [How](#how): + +* Persist info metrics with known identifying labels as a new info metric sample type. +* Store for each info metric sample (of the new type) which are the identifying labels. +* Store in the TSDB immediately that the previous version of an info metric is stale, when its data labels change. +* Add TSDB API for, given a certain time series and a certain timestamp, getting data labels, potentially filtered by certain matchers, from info metrics with identifying labels in common with the time series in question. +* Simplify inclusion of info metric labels in PromQL. + +### Audience + +Prometheus maintainers. + +## Non-Goals + +## How + +* A new info metric sample type will be introduced, where the sample value is the info metric's identifying labels. +* The head and block indexes will be augmented with indexes of info metrics. +* A method will be added to the TSDB API for matching info metric data labels to a time series, given a certain timestamp and potentially data label matchers - the method will use the aforementioned head and block info metric indexes. +* Thanks to the head and block info metric indexes, the info metric staleness problem should be solved, since one can pick the latest version of the info metric for overlapping time ranges. +* We propose simplifying the inclusion of info metric labels in PromQL through a new `info` function (TODO: describe). + +* Make it concise and **simple**; put diagrams; be concrete, avoid using “really”, “amazing” and “great” (: +* How you will test and verify? +* How you will migrate users, without downtime. How we solve incompatibilities? +* What open questions are left? (“Known unknowns”) + +## Alternatives + +The section stating potential alternatives. Highlight the objections reader should have towards your proposal as they read it. Tell them why you still think you should take this path [[ref](https://twitter.com/whereistanya/status/1353853753439490049)] + +1. This is why not solution Z... + +## Action Plan + +The tasks to do in order to migrate to the new idea. + +* [ ] Task one +* [ ] Task two ... From 38c2743017e7b6599f0520b6d3514d7b53223bc3 Mon Sep 17 00:00:00 2001 From: Arve Knudsen Date: Wed, 22 May 2024 13:10:18 +0200 Subject: [PATCH 02/16] Edit proposal Signed-off-by: Arve Knudsen --- ...ative-support-for-info-metrics-metadata.md | 76 ++++++++++++++----- 1 file changed, 55 insertions(+), 21 deletions(-) diff --git a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md index 1ab9363..ebcec59 100644 --- a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md +++ b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md @@ -1,4 +1,4 @@ -## Native Support for Info Metrics Metadata +# Add 1st class feature to PromQL for handling info type metrics * **Owners:** * Arve Knudsen [@aknuds1](https://github.com/aknuds1) [arve.knudsen@grafana.com](mailto:arve.knudsen@grafana.com) @@ -11,39 +11,61 @@ * **Other docs or links:** * [Proper support for OTEL resource attributes](https://docs.google.com/document/d/1FgHxOzCQ1Rom-PjHXsgujK8x5Xx3GTiwyG__U3Gd9Tw/edit#heading=h.unv3m5m27vuc) * [Special treatment of info metrics in Prometheus](https://docs.google.com/document/d/1ebhGNLs3uhdeprJCullM-ywA9iMRDg_mmnuFAQCloqY/edit#heading=h.2rmzk7oo6tu8) + * [Scenarios scratch pad](https://docs.google.com/document/d/1nV6N3pDfvZhmG2658huNbFSkz2rsM6SpkHabp9VVpw0/edit#heading=h.luf3yapzr29e) -> This proposal collects the requirements and implementation proposals for enhancing Prometheus with native support for info metrics metadata. +> This proposal collects the requirements and implementation proposals for adding a 1st class feature to PromQL for handling info type metrics. ## Why -Currently Prometheus "forgets" which are the identifying labels of info metrics upon ingestion, even though this information is present in at least the OpenMetrics protobuf exposition format (the OpenMetrics text exposition format unfortunately lacks this capability). -The fact that Prometheus lacks a notion of which are info metrics' identifying labels leads to certain problems: - +Currently, enriching Prometheus query results with corresponding labels from info metrics is challenging. +More specifically, it requires writing advanced PromQL to join with the info metric in question. +Take as an example querying HTTP request rates per K8s cluster and status code, while having to join with the `target_info` metric to obtain the `k8s_cluster_name` label: + +```promql +sum by (k8s_cluster_name, http_status_code) ( + rate(http_server_request_duration_seconds_count[2m]) + * on (job, instance) group_left (k8s_cluster_name) + target_info +) +``` + +The `target_info` metric is in fact the motivation for this proposal, as it's how Prometheus encodes OpenTelemetry (OTel for short) [resource attributes](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/sdk.md). +As a result, it's a very important info metric for those using Prometheus as an OTel backend. +OTel resource attributes model metadata about the environment producing metrics received by the backend (e.g. Prometheus), and Prometheus persists them as labels of `target_info`. +Typically, OTel users want to include some of these attributes (as `target_info` labels) in their query results, to correlate them with entities of theirs (e.g. K8s pods). + +Based on user demand, it would be preferable if Prometheus were to have better UX for enriching query results with info metrics labels, especially with OTel in mind. +There are other problems with Prometheus' current method of including info metric labels in queries, beyond just the technical barrier: * Explicit knowledge of each info metric's identifying labels must be embedded in join queries for when you wish to enrich queries with data (non-identifying) labels from info metrics. -* Complex join queries must be written in order to enrich time series with corresponding labels from info metrics. - This is particularly problematic in the OpenTelemetry (AKA OTel) context, since users depend on (joining with) the `target_info` info metric in order to add relevant [resource attributes](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/sdk.md) back to their Prometheus metrics. + * A certain pair of OTel resource attributes (`service.name` and `service.instance.id`) are currently assumed to be the identifying pair and mapped to `target_info`'s `job` and `instance` labels respectively, but this may become a dynamic property of the OTel model. + * Both attributes are in reality optional, so either of them might be empty (`service.name` is only mandatory for OTel SDK clients). + * If both identifying attributes are empty, `target_info` isn't generated (there being no identifying labels to join against). * If an info metric's data (non-identifying) labels change (a situation that should become more frequent with OTel in the future, as the model will probably start allowing for non-identifying resource attribute mutations), join queries against the info metric (e.g. `target_info`) will temporarily fail due to resolving the join keys to two different metrics, until the old metric is marked stale (by default after five minutes). -Especially in order to provide the best possible OTel experience, the info metric (`target_info` in the case of OTel) staleness problem needs to be solved, so users won't experience temporarily failing join queries while trying to include OTel resource attributes. -Also, it would be much better if we could provide a simpler query experience where the user doesn't have to know how to write PromQL joins (a fairly complex matter), in order to include e.g. OTel resource attributes. -Another possible positive outcome might be dedicated support in the Grafana UI for visualizing the resource attributes of each OTel metric. +If Prometheus could persist info metrics' identifying labels (e.g. `job` and `instance` for `target_info`), human knowledge of the correct identifying labels may become unnecessary when "joining" with info metrics. +Information about info metric identifying labels is present in at least the OpenMetrics protobuf exposition format (the OpenMetrics text exposition format unfortunately lacks this capability). +It can also easily be deduced when ingesting metrics from OTLP (OTel Protocol). +Intrinsic knowledge of info metrics' identifying labels could also help in solving temporary conflicts between old and new versions of info metrics, when data (non-identifying) labels change. +Another possible positive outcome might be dedicated support in UIs (e.g. Grafana) for visualizing the resource attributes of OTel metrics. ### Pitfalls of the current solution Prometheus currently persists info metrics as if they were normal float samples. -This means that knowledge of info metrics' identifying labels are lost, and you have to base yourself on convention when querying on them (for example that `target_info` should have `job` and `instance` as identifying labels). +This means that knowledge of info metrics' identifying labels are lost, and you have to base yourself on convention when querying them (for example that `target_info` should have `job` and `instance` as identifying labels). +There's also no particular support for enriching query results with info metric labels in PromQL. The consequence is that you need relatively expert level PromQL knowledge to include info metric labels in your query results; as OTel grows in popularity, this becomes more and more of a problem as users will want to include certain labels from `target_info` (corresponding to OTel resource attributes). -Without persisted info metric metadata, one can't build more user friendly abstractions (e.g. a PromQL function) for including OTel resource attributes (or other info metric labels) in query results. Neither can you build dedicated UI for OTel resource attributes (or other info metric labels). +Without persisted info metric metadata, one can't build more user friendly abstractions (e.g. a PromQL function) for including OTel resource attributes (or other info metric labels) in query results. +Neither can you build dedicated UI for OTel resource attributes (or other info metric labels). ## Goals Goals and use cases for the solution as proposed in [How](#how): -* Persist info metrics with known identifying labels as a new info metric sample type. -* Store for each info metric sample (of the new type) which are the identifying labels. -* Store in the TSDB immediately that the previous version of an info metric is stale, when its data labels change. +* Persist info metrics with labels categorized as either identifying or non-identifying. +* Track when info metrics' set of identifying labels changes. This shouldn't be a frequent occurrence, but it should be handled. +* Automatically treat the old version of an info metric as stale for query result enriching purposes, when its data labels change (producing a new time series, but with same identity). * Add TSDB API for, given a certain time series and a certain timestamp, getting data labels, potentially filtered by certain matchers, from info metrics with identifying labels in common with the time series in question. -* Simplify inclusion of info metric labels in PromQL. +* Simplify enriching of query results with info metric labels in PromQL, e.g. via a new function. ### Audience @@ -53,12 +75,24 @@ Prometheus maintainers. ## How -* A new info metric sample type will be introduced, where the sample value is the info metric's identifying labels. -* The head and block indexes will be augmented with indexes of info metrics. -* A method will be added to the TSDB API for matching info metric data labels to a time series, given a certain timestamp and potentially data label matchers - the method will use the aforementioned head and block info metric indexes. -* Thanks to the head and block info metric indexes, the info metric staleness problem should be solved, since one can pick the latest version of the info metric for overlapping time ranges. -* We propose simplifying the inclusion of info metric labels in PromQL through a new `info` function (TODO: describe). +* Introduce a new info metric sample type, to track the info metric's identifying label set over time (in case it changes). +* Augment the head and block indexes with indexes of info metrics, for easy finding of info metrics matching time series. +* Add a method to the TSDB API for matching info metric data labels to a time series, given a certain timestamp and potentially data label matchers - the method will use the aforementioned head and block info metric indexes. +* Simplify the inclusion of info metric labels in PromQL through a new `info` function: `info(v instant-vector[, ls label-selector])`. + This function will be UI for the aforementioned TSDB API. + +Using the `info` function, we can simplify the previously given PromQL join example as follows: + +``` +sum by (k8s_cluster_name, http_status_code) ( + info( + rate(http_server_request_duration_seconds_count[2m]), + {k8s_cluster_name=~".+"} + ) +) +``` +TODO: * Make it concise and **simple**; put diagrams; be concrete, avoid using “really”, “amazing” and “great” (: * How you will test and verify? * How you will migrate users, without downtime. How we solve incompatibilities? From 88feea21401dada2a10a9d35dd48a4fe15dff441 Mon Sep 17 00:00:00 2001 From: Arve Knudsen Date: Fri, 24 May 2024 12:18:47 +0200 Subject: [PATCH 03/16] Specify TSDB API in depth Signed-off-by: Arve Knudsen --- ...-native-support-for-info-metrics-metadata.md | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md index ebcec59..3528017 100644 --- a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md +++ b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md @@ -65,7 +65,22 @@ Goals and use cases for the solution as proposed in [How](#how): * Track when info metrics' set of identifying labels changes. This shouldn't be a frequent occurrence, but it should be handled. * Automatically treat the old version of an info metric as stale for query result enriching purposes, when its data labels change (producing a new time series, but with same identity). * Add TSDB API for, given a certain time series and a certain timestamp, getting data labels, potentially filtered by certain matchers, from info metrics with identifying labels in common with the time series in question. -* Simplify enriching of query results with info metric labels in PromQL, e.g. via a new function. + * If no data label matchers are provided, _all_ the data labels of found info metrics are added to the resulting time series. + * If data label matchers are provided, only info metrics with matching data labels are considered. + * If data label matchers are provided, _precisely_ the data labels specified by the label matchers are added to the returned time series. + * If data label matchers are provided, time series are only included in the result if matching data labels from info metrics were found. + * A data label matcher like `k8s_cluster_name=~".+"` guarantees that each returned time series has a non-empty `k8s_cluster_name` label, implying that time series for which no matching info metrics have a data label named `k8s_cluster_name` (including the case where no matching info metric exists at all) will be excluded from the result. + * A special case: If a data label matcher allows empty labels (equivalent to missing labels, e.g. `k8s_cluster_name=~".*"`), it will not exclude time series from the result even if there's no matching info metric. + * A data label matcher like `__name__="target_info"` can be used to restrict the info metrics used. + However, the `__name__` label itself will not be copied. + * Label collisions: The input instant vector could already contain labels that are also part of the data labels of a matching info metric. + Furthermore, since the info function might find multiple differently named info metrics with matching identifying labels, those might have overlapping data labels. + In this case, the info function has to check if the values of the affected labels match or are different. + The former case is not really a label collision and therefore causes no problem. + In the latter case, however, the function has to return an error. + The collision can be resolved by constraining the labels via the optional label-selector argument of the info function. + And of course, the user always has the option to go back to the original join syntax (or, even better, avoiding ingesting conflicting info metrics in the first place). +* Simplify enriching of query results with info metric data (non-identifying) labels in PromQL, e.g. via a new function, based on aforementioned TSDB API. ### Audience From 7fd4887d4e31a9bdf88af86e9f3ace1c005bc534 Mon Sep 17 00:00:00 2001 From: Arve Knudsen Date: Fri, 31 May 2024 14:24:26 +0200 Subject: [PATCH 04/16] Change proposal title after reviewer feedback Signed-off-by: Arve Knudsen --- .../2024-04-10-native-support-for-info-metrics-metadata.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md index 3528017..431c4c6 100644 --- a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md +++ b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md @@ -1,4 +1,4 @@ -# Add 1st class feature to PromQL for handling info type metrics +# Simplify joins with info metrics in PromQL * **Owners:** * Arve Knudsen [@aknuds1](https://github.com/aknuds1) [arve.knudsen@grafana.com](mailto:arve.knudsen@grafana.com) @@ -13,7 +13,7 @@ * [Special treatment of info metrics in Prometheus](https://docs.google.com/document/d/1ebhGNLs3uhdeprJCullM-ywA9iMRDg_mmnuFAQCloqY/edit#heading=h.2rmzk7oo6tu8) * [Scenarios scratch pad](https://docs.google.com/document/d/1nV6N3pDfvZhmG2658huNbFSkz2rsM6SpkHabp9VVpw0/edit#heading=h.luf3yapzr29e) -> This proposal collects the requirements and implementation proposals for adding a 1st class feature to PromQL for handling info type metrics. +> This proposal collects the requirements and implementation proposals for simplifying joins with info type metrics in PromQL. ## Why From d42a059f72d89925fe7464b317b51c0287214e16 Mon Sep 17 00:00:00 2001 From: Arve Knudsen Date: Mon, 17 Jun 2024 14:10:05 +0200 Subject: [PATCH 05/16] Refine proposal Signed-off-by: Arve Knudsen --- ...ative-support-for-info-metrics-metadata.md | 32 ++++++++++++------- 1 file changed, 20 insertions(+), 12 deletions(-) diff --git a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md index 431c4c6..cf7ee83 100644 --- a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md +++ b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md @@ -38,8 +38,8 @@ Based on user demand, it would be preferable if Prometheus were to have better U There are other problems with Prometheus' current method of including info metric labels in queries, beyond just the technical barrier: * Explicit knowledge of each info metric's identifying labels must be embedded in join queries for when you wish to enrich queries with data (non-identifying) labels from info metrics. * A certain pair of OTel resource attributes (`service.name` and `service.instance.id`) are currently assumed to be the identifying pair and mapped to `target_info`'s `job` and `instance` labels respectively, but this may become a dynamic property of the OTel model. - * Both attributes are in reality optional, so either of them might be empty (`service.name` is only mandatory for OTel SDK clients). - * If both identifying attributes are empty, `target_info` isn't generated (there being no identifying labels to join against). + * Both attributes are in reality optional, so either of them might be missing (`service.name` is only mandatory for OTel SDK clients). + * If both identifying attributes are missing, `target_info` isn't generated (there being no identifying labels to join against). * If an info metric's data (non-identifying) labels change (a situation that should become more frequent with OTel in the future, as the model will probably start allowing for non-identifying resource attribute mutations), join queries against the info metric (e.g. `target_info`) will temporarily fail due to resolving the join keys to two different metrics, until the old metric is marked stale (by default after five minutes). If Prometheus could persist info metrics' identifying labels (e.g. `job` and `instance` for `target_info`), human knowledge of the correct identifying labels may become unnecessary when "joining" with info metrics. @@ -61,9 +61,11 @@ Neither can you build dedicated UI for OTel resource attributes (or other info m Goals and use cases for the solution as proposed in [How](#how): -* Persist info metrics with labels categorized as either identifying or non-identifying. +* Persist info metrics with labels categorized as either identifying or non-identifying (i.e. data labels). * Track when info metrics' set of identifying labels changes. This shouldn't be a frequent occurrence, but it should be handled. -* Automatically treat the old version of an info metric as stale for query result enriching purposes, when its data labels change (producing a new time series, but with same identity). + * When enriching a query result's labels with data labels from info metrics, it should be considered per timestamp what are each potentially matching info metric's identifying labels (since the identifying label set may change over time). +* Automatically treat the old version of an info metric as stale for query result enriching purposes, when its data labels change (producing a new time series, but with same identity from an info metric perspective). + * When enriching a query result's labels with data labels from info metrics, and there are several matches with equally named info metrics (e.g. `target_info`) for a timestamp, the one with the newest sample wins (others are considered stale). * Add TSDB API for, given a certain time series and a certain timestamp, getting data labels, potentially filtered by certain matchers, from info metrics with identifying labels in common with the time series in question. * If no data label matchers are provided, _all_ the data labels of found info metrics are added to the resulting time series. * If data label matchers are provided, only info metrics with matching data labels are considered. @@ -74,11 +76,11 @@ Goals and use cases for the solution as proposed in [How](#how): * A data label matcher like `__name__="target_info"` can be used to restrict the info metrics used. However, the `__name__` label itself will not be copied. * Label collisions: The input instant vector could already contain labels that are also part of the data labels of a matching info metric. - Furthermore, since the info function might find multiple differently named info metrics with matching identifying labels, those might have overlapping data labels. - In this case, the info function has to check if the values of the affected labels match or are different. + Furthermore, since multiple differently named info metrics with matching identifying labels might be found, those might have overlapping data labels. + In this case, the implementation has to check if the values of the affected labels match or are different. The former case is not really a label collision and therefore causes no problem. - In the latter case, however, the function has to return an error. - The collision can be resolved by constraining the labels via the optional label-selector argument of the info function. + In the latter case, however, an error has to be returned to the user. + The collision can be resolved by constraining the labels via data label matchers. And of course, the user always has the option to go back to the original join syntax (or, even better, avoiding ingesting conflicting info metrics in the first place). * Simplify enriching of query results with info metric data (non-identifying) labels in PromQL, e.g. via a new function, based on aforementioned TSDB API. @@ -92,7 +94,15 @@ Prometheus maintainers. * Introduce a new info metric sample type, to track the info metric's identifying label set over time (in case it changes). * Augment the head and block indexes with indexes of info metrics, for easy finding of info metrics matching time series. + * The TSDB head and every block register their respective info metrics in a corresponding index, with the different identifying label sets each info metric has had over its lifetime. +* Augment the OTLP endpoint to specify `target_info`'s identifying labels when ingesting write requests, and to store it as the native info metric type. * Add a method to the TSDB API for matching info metric data labels to a time series, given a certain timestamp and potentially data label matchers - the method will use the aforementioned head and block info metric indexes. + * Candidate info metrics are found by searching the info metric index for info metrics with identifying labels contained in the input label set. + * Each candidate info metric's identifying label set _for the timestamp in question_, is obtained from the info metric's samples. + * If that identifying label set is not a match, the info metric is ignored. + * If several info metrics with the same name are found, the one with the latest sample is chosen (i.e., older metrics are considered stale). + * Data labels are picked from the found info metrics according to the rules defined in the Goals section. + * Each info metric's data labels are determined by taking those of the metric's labels which are not in the identifying label set. * Simplify the inclusion of info metric labels in PromQL through a new `info` function: `info(v instant-vector[, ls label-selector])`. This function will be UI for the aforementioned TSDB API. @@ -108,10 +118,8 @@ sum by (k8s_cluster_name, http_status_code) ( ``` TODO: -* Make it concise and **simple**; put diagrams; be concrete, avoid using “really”, “amazing” and “great” (: -* How you will test and verify? -* How you will migrate users, without downtime. How we solve incompatibilities? -* What open questions are left? (“Known unknowns”) + +* Specify detection of info metric identifying labels for other ingestion methods than OTLP. ## Alternatives From 3f3b7f5b6f3beb402be4997707d1c346c95014e1 Mon Sep 17 00:00:00 2001 From: Arve Knudsen Date: Tue, 18 Jun 2024 14:26:22 +0200 Subject: [PATCH 06/16] Add OOO TODO Signed-off-by: Arve Knudsen --- proposals/2024-04-10-native-support-for-info-metrics-metadata.md | 1 + 1 file changed, 1 insertion(+) diff --git a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md index cf7ee83..3991e9d 100644 --- a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md +++ b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md @@ -120,6 +120,7 @@ sum by (k8s_cluster_name, http_status_code) ( TODO: * Specify detection of info metric identifying labels for other ingestion methods than OTLP. +* Define how this functionality would work together with OOO samples. ## Alternatives From 32707a8311be3bbc6bb5d63b30ec8071e34ea89d Mon Sep 17 00:00:00 2001 From: Arve Knudsen Date: Wed, 26 Jun 2024 16:55:32 -0700 Subject: [PATCH 07/16] Remove non-goals section Signed-off-by: Arve Knudsen --- .../2024-04-10-native-support-for-info-metrics-metadata.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md index 3991e9d..c52db62 100644 --- a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md +++ b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md @@ -88,8 +88,6 @@ Goals and use cases for the solution as proposed in [How](#how): Prometheus maintainers. -## Non-Goals - ## How * Introduce a new info metric sample type, to track the info metric's identifying label set over time (in case it changes). From 7fdf82ae3a924e4654a0040243cfe4d3ad7d751a Mon Sep 17 00:00:00 2001 From: Arve Knudsen Date: Wed, 26 Jun 2024 17:03:13 -0700 Subject: [PATCH 08/16] Add two goals after feedback from Ying Signed-off-by: Arve Knudsen --- .../2024-04-10-native-support-for-info-metrics-metadata.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md index c52db62..75b374d 100644 --- a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md +++ b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md @@ -83,6 +83,8 @@ Goals and use cases for the solution as proposed in [How](#how): The collision can be resolved by constraining the labels via data label matchers. And of course, the user always has the option to go back to the original join syntax (or, even better, avoiding ingesting conflicting info metrics in the first place). * Simplify enriching of query results with info metric data (non-identifying) labels in PromQL, e.g. via a new function, based on aforementioned TSDB API. +* Ensure backwards compatibility with current Prometheus usage. +* Minimize potential conflicts with existing metric labels. ### Audience From e26521771889da38779f3e31637ae2ce45f49515 Mon Sep 17 00:00:00 2001 From: Arve Knudsen Date: Mon, 1 Jul 2024 16:11:11 +0200 Subject: [PATCH 09/16] Augment Why section, based on reviewer feedback Signed-off-by: Arve Knudsen --- .../2024-04-10-native-support-for-info-metrics-metadata.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md index 75b374d..30e1ed5 100644 --- a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md +++ b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md @@ -17,6 +17,11 @@ ## Why +Info metrics are [defined by the OpenMetrics specification](https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#info) as "used to expose textual information which SHOULD NOT change during process lifetime". +Furthermore the OpenMetrics specification states that info metrics ["MUST have the suffix `_info`"](https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#info-1). +Despite the latter OpenMetrics requirement, there are metrics with the info metric usage pattern that don't have the `_info` suffix, e.g. `kube_pod_labels`. +In this proposal, we shall include the latter in the definition of info metrics. + Currently, enriching Prometheus query results with corresponding labels from info metrics is challenging. More specifically, it requires writing advanced PromQL to join with the info metric in question. Take as an example querying HTTP request rates per K8s cluster and status code, while having to join with the `target_info` metric to obtain the `k8s_cluster_name` label: From 7c8f527e67cd667c99d9a23b14a6829a527bafcf Mon Sep 17 00:00:00 2001 From: Arve Knudsen Date: Mon, 1 Jul 2024 16:40:05 +0200 Subject: [PATCH 10/16] Apply reviewer feedback Signed-off-by: Arve Knudsen --- proposals/2024-04-10-native-support-for-info-metrics-metadata.md | 1 + 1 file changed, 1 insertion(+) diff --git a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md index 30e1ed5..9aaae6c 100644 --- a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md +++ b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md @@ -50,6 +50,7 @@ There are other problems with Prometheus' current method of including info metri If Prometheus could persist info metrics' identifying labels (e.g. `job` and `instance` for `target_info`), human knowledge of the correct identifying labels may become unnecessary when "joining" with info metrics. Information about info metric identifying labels is present in at least the OpenMetrics protobuf exposition format (the OpenMetrics text exposition format unfortunately lacks this capability). It can also easily be deduced when ingesting metrics from OTLP (OTel Protocol). +Most info metrics' identifying labels will be `job` and `instance`, but there are some exceptions (e.g. `kube_pod_labels`). Intrinsic knowledge of info metrics' identifying labels could also help in solving temporary conflicts between old and new versions of info metrics, when data (non-identifying) labels change. Another possible positive outcome might be dedicated support in UIs (e.g. Grafana) for visualizing the resource attributes of OTel metrics. From 5cfeef324cd3b98245c7c31ffb6e2074dd4b248f Mon Sep 17 00:00:00 2001 From: Arve Knudsen Date: Mon, 1 Jul 2024 16:46:53 +0200 Subject: [PATCH 11/16] Add alternative Signed-off-by: Arve Knudsen --- ...10-native-support-for-info-metrics-metadata.md | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md index 9aaae6c..b8806a5 100644 --- a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md +++ b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md @@ -130,9 +130,20 @@ TODO: ## Alternatives -The section stating potential alternatives. Highlight the objections reader should have towards your proposal as they read it. Tell them why you still think you should take this path [[ref](https://twitter.com/whereistanya/status/1353853753439490049)] +### Add metadata as prefixed labels -1. This is why not solution Z... +Instead of encoding metadata, e.g. OTel resource attributes, as info metric labels, add them directly as labels to corresponding metrics. + +#### Pros + +* Simplicity, removes need for joining with info metrics + +#### Cons + +* Metrics will have potentially far more labels than what's strictly necessary to identify them +* Temporary series churn when migrating existing metrics to this new scheme +* Increased series churn when metadata labels change +* More labels per metric increases CPU/memory usage ## Action Plan From ceead4a79213e7d5a66d3642a7147cddec39099c Mon Sep 17 00:00:00 2001 From: Arve Knudsen Date: Thu, 11 Jul 2024 08:25:06 +0200 Subject: [PATCH 12/16] Remove TSDB details, leaving these for separate proposal. In agreement w/ @codesome, leave TSDB details for a separate TSDB proposal. Signed-off-by: Arve Knudsen --- ...10-native-support-for-info-metrics-metadata.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md index b8806a5..0331ee0 100644 --- a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md +++ b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md @@ -98,17 +98,18 @@ Prometheus maintainers. ## How -* Introduce a new info metric sample type, to track the info metric's identifying label set over time (in case it changes). -* Augment the head and block indexes with indexes of info metrics, for easy finding of info metrics matching time series. - * The TSDB head and every block register their respective info metrics in a corresponding index, with the different identifying label sets each info metric has had over its lifetime. -* Augment the OTLP endpoint to specify `target_info`'s identifying labels when ingesting write requests, and to store it as the native info metric type. -* Add a method to the TSDB API for matching info metric data labels to a time series, given a certain timestamp and potentially data label matchers - the method will use the aforementioned head and block info metric indexes. - * Candidate info metrics are found by searching the info metric index for info metrics with identifying labels contained in the input label set. - * Each candidate info metric's identifying label set _for the timestamp in question_, is obtained from the info metric's samples. +* Track the info metric's identifying label set over time (in case it changes) - storage model details to be elaborated in separate proposal. +* Keep info metric indexes in storage - storage model details to be elaborated in separate proposal. + * Info metric indexes maintaining per info metric the different identifying label sets it has had over its lifetime. + * Indexing the different identifying label sets an info metric has had over its lifetime allows determining which are potential matches for a given metric, before considering the time dimension. +* Add a method to the TSDB API for matching info metric data labels to a time series, given a certain timestamp and potentially data label matchers - the method will use the aforementioned info metric indexes. + * Candidate info metrics are found by searching info metric indexes for info metrics with identifying labels contained in the input label set. + * Each candidate info metric's identifying label set is obtained _for the timestamp in question_ - storage model details to be elaborated in separate proposal. * If that identifying label set is not a match, the info metric is ignored. * If several info metrics with the same name are found, the one with the latest sample is chosen (i.e., older metrics are considered stale). * Data labels are picked from the found info metrics according to the rules defined in the Goals section. * Each info metric's data labels are determined by taking those of the metric's labels which are not in the identifying label set. +* Augment the OTLP endpoint to specify `target_info`'s identifying labels when ingesting write requests. * Simplify the inclusion of info metric labels in PromQL through a new `info` function: `info(v instant-vector[, ls label-selector])`. This function will be UI for the aforementioned TSDB API. From e2ccddacf33b43c84e4f06e532b275eeff255a1f Mon Sep 17 00:00:00 2001 From: Arve Knudsen Date: Tue, 23 Jul 2024 15:23:42 +0200 Subject: [PATCH 13/16] Add tasks Signed-off-by: Arve Knudsen --- .../2024-04-10-native-support-for-info-metrics-metadata.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md index 0331ee0..d82e68e 100644 --- a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md +++ b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md @@ -150,5 +150,5 @@ Instead of encoding metadata, e.g. OTel resource attributes, as info metric labe The tasks to do in order to migrate to the new idea. -* [ ] Task one -* [ ] Task two ... +* [ ] [Add experimental PromQL `info` function MVP](https://github.com/prometheus/prometheus/pull/14495) +* [ ] Extend `info` function MVP with the ability to support `info` metrics in general, with persistence of info metrics metadata From 01692a6fea367e5328afc74588cf5949419c4ac1 Mon Sep 17 00:00:00 2001 From: Arve Knudsen Date: Tue, 23 Jul 2024 16:11:58 +0200 Subject: [PATCH 14/16] Add alternative: Require specifying info metric(s) Signed-off-by: Arve Knudsen --- ...04-10-native-support-for-info-metrics-metadata.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md index d82e68e..5d9cc68 100644 --- a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md +++ b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md @@ -146,6 +146,18 @@ Instead of encoding metadata, e.g. OTel resource attributes, as info metric labe * Increased series churn when metadata labels change * More labels per metric increases CPU/memory usage +### Make the `info` function require specifying the info metric(s) + +Instead of letting `info` by default join with all matching info metrics, have it require specifying the info metric name(s). + +#### Pros + +* The user won't be confused about data labels being included from info metrics they didn't expect + +#### Cons + +* The UX becomes more complex, as the user is required to specify which info metric(s) to join with + ## Action Plan The tasks to do in order to migrate to the new idea. From c54c0a4f674b4f78edf034c443c96fdcec721522 Mon Sep 17 00:00:00 2001 From: Arve Knudsen Date: Sun, 28 Jul 2024 19:58:35 +0200 Subject: [PATCH 15/16] Remove TODOs after reviewer feedback Signed-off-by: Arve Knudsen --- .../2024-04-10-native-support-for-info-metrics-metadata.md | 5 ----- 1 file changed, 5 deletions(-) diff --git a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md index 5d9cc68..aec9d32 100644 --- a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md +++ b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md @@ -124,11 +124,6 @@ sum by (k8s_cluster_name, http_status_code) ( ) ``` -TODO: - -* Specify detection of info metric identifying labels for other ingestion methods than OTLP. -* Define how this functionality would work together with OOO samples. - ## Alternatives ### Add metadata as prefixed labels From 612b41bb70ea31d4719e96a618619623cb8d75de Mon Sep 17 00:00:00 2001 From: Arve Knudsen Date: Sun, 28 Jul 2024 20:36:50 +0200 Subject: [PATCH 16/16] Apply reviewer feedback Signed-off-by: Arve Knudsen --- ...ative-support-for-info-metrics-metadata.md | 36 ++++++++----------- 1 file changed, 14 insertions(+), 22 deletions(-) diff --git a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md index aec9d32..b3d4d6b 100644 --- a/proposals/2024-04-10-native-support-for-info-metrics-metadata.md +++ b/proposals/2024-04-10-native-support-for-info-metrics-metadata.md @@ -72,7 +72,17 @@ Goals and use cases for the solution as proposed in [How](#how): * When enriching a query result's labels with data labels from info metrics, it should be considered per timestamp what are each potentially matching info metric's identifying labels (since the identifying label set may change over time). * Automatically treat the old version of an info metric as stale for query result enriching purposes, when its data labels change (producing a new time series, but with same identity from an info metric perspective). * When enriching a query result's labels with data labels from info metrics, and there are several matches with equally named info metrics (e.g. `target_info`) for a timestamp, the one with the newest sample wins (others are considered stale). -* Add TSDB API for, given a certain time series and a certain timestamp, getting data labels, potentially filtered by certain matchers, from info metrics with identifying labels in common with the time series in question. +* Simplify enriching of query results with info metric data (non-identifying) labels in PromQL, e.g. via a new function. +* Ensure backwards compatibility with current Prometheus usage. +* Minimize potential conflicts with existing metric labels. + +### Audience + +Prometheus maintainers. + +## How + +* Simplify the inclusion of info metric labels in PromQL through a new `info` function: `info(v instant-vector[, ls data-label-selector])`. * If no data label matchers are provided, _all_ the data labels of found info metrics are added to the resulting time series. * If data label matchers are provided, only info metrics with matching data labels are considered. * If data label matchers are provided, _precisely_ the data labels specified by the label matchers are added to the returned time series. @@ -81,6 +91,7 @@ Goals and use cases for the solution as proposed in [How](#how): * A special case: If a data label matcher allows empty labels (equivalent to missing labels, e.g. `k8s_cluster_name=~".*"`), it will not exclude time series from the result even if there's no matching info metric. * A data label matcher like `__name__="target_info"` can be used to restrict the info metrics used. However, the `__name__` label itself will not be copied. + * In the case of multiple versions of the same info metric being found (with the same identifying labels), the one with the newest sample wins. * Label collisions: The input instant vector could already contain labels that are also part of the data labels of a matching info metric. Furthermore, since multiple differently named info metrics with matching identifying labels might be found, those might have overlapping data labels. In this case, the implementation has to check if the values of the affected labels match or are different. @@ -88,30 +99,11 @@ Goals and use cases for the solution as proposed in [How](#how): In the latter case, however, an error has to be returned to the user. The collision can be resolved by constraining the labels via data label matchers. And of course, the user always has the option to go back to the original join syntax (or, even better, avoiding ingesting conflicting info metrics in the first place). -* Simplify enriching of query results with info metric data (non-identifying) labels in PromQL, e.g. via a new function, based on aforementioned TSDB API. -* Ensure backwards compatibility with current Prometheus usage. -* Minimize potential conflicts with existing metric labels. - -### Audience - -Prometheus maintainers. - -## How - -* Track the info metric's identifying label set over time (in case it changes) - storage model details to be elaborated in separate proposal. +* Track each info metric's identifying label set over time (in case it changes) - storage model details to be elaborated in separate proposal. + * This allows determining on a per-timestamp basis, which are an identifying metric's identifying labels. * Keep info metric indexes in storage - storage model details to be elaborated in separate proposal. * Info metric indexes maintaining per info metric the different identifying label sets it has had over its lifetime. * Indexing the different identifying label sets an info metric has had over its lifetime allows determining which are potential matches for a given metric, before considering the time dimension. -* Add a method to the TSDB API for matching info metric data labels to a time series, given a certain timestamp and potentially data label matchers - the method will use the aforementioned info metric indexes. - * Candidate info metrics are found by searching info metric indexes for info metrics with identifying labels contained in the input label set. - * Each candidate info metric's identifying label set is obtained _for the timestamp in question_ - storage model details to be elaborated in separate proposal. - * If that identifying label set is not a match, the info metric is ignored. - * If several info metrics with the same name are found, the one with the latest sample is chosen (i.e., older metrics are considered stale). - * Data labels are picked from the found info metrics according to the rules defined in the Goals section. - * Each info metric's data labels are determined by taking those of the metric's labels which are not in the identifying label set. -* Augment the OTLP endpoint to specify `target_info`'s identifying labels when ingesting write requests. -* Simplify the inclusion of info metric labels in PromQL through a new `info` function: `info(v instant-vector[, ls label-selector])`. - This function will be UI for the aforementioned TSDB API. Using the `info` function, we can simplify the previously given PromQL join example as follows: