OM 2.0: Consider using complex values instead of suffixes #283

dashpole · 2024-12-02T21:04:34Z

This deserves its own proposal, but i'll outline the broad idea here start the discussion and gather high-level feedback.

Idea

The idea is that we could use complex values for fixed bucket histograms, summaries, and counters, similar to what we plan to do for native histograms. In OM 1.0 and in the Prometheus text format, those types are represented using multiple series with suffixes and special labels (e.g. _bucket suffix, or the le label for histograms). Counters are included here because they have a "total" and a "start time".

Advantages

Solves Handle OpenMetrics ..._created lines prometheus#6541 by supporting the created timestamp in the value.
Prevents collisions between suffixed series names and unsuffixed series names of a different type.
- E.g. histogram: my_metric vs gauge: my_metric_bucket
Removes PromQL differences between querying native and classic histograms: https://grafana.com/docs/mimir/latest/visualize/native-histograms/#query-your-histograms-count-or-sum (couldn't find Prometheus documentation)
Names in PromQL queries always match names defined in the application.
- This is helpful for metrics defined in other metric libraries, such as OpenTelemetry, where users don't expect suffixes.

Disadvantages

Breaks existing user expectations and queries. Suffixes are very deeply embedded in the Prometheus ecosystem, and this would be a large change for many users.
PromQL queries become more complicated because accessing fields requires functions.
- E.g. sum(request_duration_seconds_count) -> histogram_count(sum(request_duration_seconds)).
- This would happen anyways if the user migrates to native histograms.
Little benefit for summaries and histograms, as we expect/recommend users adopt native histograms anyways.
Is there a text format representation that is readable AND easy to generate AND efficient enough to parse, for such a model?
It would make OM 2.0 text significantly different to 1.0 and Prometheus text, so some education and big change in parsers/generators would be needed. Not a blocker, but something to keep in mind as a con.

Alternatives

We could only use a complex value for counters, to support the start time in addition to the value. For fixed-bucket histograms and summaries, keep the existing suffixes and labels, but use the "complex counter" value for cumulative series. For users that have migrated from summaries and fixed-bucket histograms to native histograms, this has most of the advantages of the above, without many of the disadvantages for users using summaries or fixed-bucket histograms.

The text was updated successfully, but these errors were encountered:

ArthurSens · 2024-12-04T14:28:42Z

With Native Histograms stabilizing, is there a scenario where a user would prefer classic histograms over native histograms? If not, should we invest effort in this? 😅

Solves prometheus/prometheus#6541 by supporting the created timestamp in the value.

We also planned to move _created to something similar to the way we handle metadata. Would that be enough for classic histograms as well?

In our first call, we mentioned that one of our intentions with 2.0 is not to break the Prometheus user base to make the spec more favorable to other specs, so I'm not sure how we could commit to this one 😬

bwplotka · 2024-12-04T17:13:30Z

I think this would be great to consider, thanks! Essentially it removes metric family notion. We kind of do similar in protobuf format already.

Also parses could generate non-complex types for classic histograms and summaries if needed.

Two extra downsides to think about:

Is there a text format representation that is readable AND easy to generate AND efficient enough to parse, for such a model?
It would make OM 2.0 text significantly different to 1.0 and Prometheus text, so some education and big change in parsers/generators would be needed. Not a blocker, but something to keep in mind as a con.

bwplotka · 2024-12-04T17:14:18Z

With Native Histograms stabilizing, is there a scenario where a user would prefer classic histograms over native histograms? If not, should we invest effort in this? 😅

Yes and we could simply do the NHCB (custom buckets) straight in the text too.

bwplotka · 2024-12-04T17:15:21Z

In our first call, we mentioned that one of our intentions with 2.0 is not to break the Prometheus user base to make the spec more favorable to other specs, so I'm not sure how we could commit to this one 😬

How it is breaking? While it would cause parser redesign, you can represent the current Prometheus model just fine with this idea, no?

dashpole · 2024-12-04T17:32:53Z

Added to the list of cons.

dashpole · 2024-12-05T20:52:57Z

Unsurprisingly, i'm not the first one to suggest this. @bwplotka pointed me to https://github.com/prometheus/OpenMetrics/blob/main/legacy/markdown/protobuf_vs_text.md#implied-data-model, by @beorn7 which contains a much more thorough description of the tradeoffs involved with using single-line representations of complex types.

dashpole · 2024-12-05T20:55:10Z

Regarding PromQL queries become more complicated because accessing fields requires functions., I learned that this migration may be happening independent of this proposal based on https://github.com/prometheus/proposals/blob/main/proposals/2024-01-26_classic-histograms-stored-as-native-histograms.md#reading-via-promql.

bwplotka · 2024-12-20T13:48:57Z

Relevant thoughts from @beorn7 on the should PromQL use dots thread around complex types.

Prometheus has been plagued by magic suffixes from the very
beginning. In my understanding of Prometheus history, suffixing
components of a summary or a histogram with _count, _sum,
_bucket was a means to get an MVP running. I think it was a
mistake to reify this concept (originally constrained to TSDB and
PromQL) by letting it leak into the exposition format. OpenMetrics
made things worse by introducing more magic suffixes (_info got
introduced, and _total got a promotion from a mere recommendation
to another magic suffix, and arguably the same happened to the
unit). The problem with magic suffixes is namespace pollution: It
prevents usage of any of the magic suffixes in metric names. Or to
be precise: It's even worse, the usage is not really technically
forbidde. You can still do it, but then you might run into
surprising and confusing namespace collisions that might very well
show up in the worst of moments. A way out of this is to use a
separator character for the magic suffixes that is not a valid
character in names otherwise. And now guess which character comes
to mind for that... [This is another point in the category "future
feature has a hard time blocking a feature proposal for the
present". This was concretely considered when OpenMetrics was
designed, but it got rejected by the OpenMetrics team. So there is
a non-zero chance that it will be on the table again when we try to
"fix" OpenMetrics. A counter point is again that another character
could be used, at the price of using something that is much less
intuitive to learn and read.]

Native histograms introduced the first instance of a "structured
metric", but more could happen in the future. Accessing "fields" in
this structure is currently done by bespoke functions in PromQL
(histogram_count(request_latency_seconds),
histogram_sum(request_latency_seconds)), but it would be a quite
obvious alternative to allow something like
request_latency_seconds.count and request_latency_seconds.sum,
which is actually more than just syntactic sugar because we could
implement "field access" as a different thing from "function
call". The latter is an evaluation, it changes the timestamp, and
it cannot be used in a range selector
(histogram_count(request_latency_seconds)[5m] is invalid syntax),
while we could make request_latency_seconds.count[5m] valid
without changing the language fundamentals. [This is much more
tangible than (4) and (5), but not decided yet, so it still asks us
to reject a concrete feature request in the name of a possible
future feature. And again, there are other ways of implementing
this, avoiding the usage of a . operator, at the price of not
doing the most obvious.]

dashpole added this to Open Metrics 2.0 Dec 2, 2024

dashpole moved this to Todo in Open Metrics 2.0 Dec 2, 2024

ArthurSens mentioned this issue Dec 9, 2024

Inconsistent requirements for suffixes #284

Open

bwplotka mentioned this issue Dec 9, 2024

Add native histogram specification prometheus/docs#2539

Merged

beorn7 mentioned this issue Dec 11, 2024

OM 2.0: Consider relaxing the requirement for counters timeseries to have a _total suffix #282

Open

This was referenced Dec 11, 2024

OM 2.0: Native Histogram Support in Text format. #279

Open

proposal: Update Created Timestamps syntax for OpenMetrics prometheus/proposals#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OM 2.0: Consider using complex values instead of suffixes #283

OM 2.0: Consider using complex values instead of suffixes #283

dashpole commented Dec 2, 2024 •

edited

Loading

ArthurSens commented Dec 4, 2024

bwplotka commented Dec 4, 2024

bwplotka commented Dec 4, 2024

bwplotka commented Dec 4, 2024

dashpole commented Dec 4, 2024

dashpole commented Dec 5, 2024

dashpole commented Dec 5, 2024

bwplotka commented Dec 20, 2024

OM 2.0: Consider using complex values instead of suffixes #283

OM 2.0: Consider using complex values instead of suffixes #283

Comments

dashpole commented Dec 2, 2024 • edited Loading

Idea

Advantages

Disadvantages

Alternatives

ArthurSens commented Dec 4, 2024

bwplotka commented Dec 4, 2024

bwplotka commented Dec 4, 2024

bwplotka commented Dec 4, 2024

dashpole commented Dec 4, 2024

dashpole commented Dec 5, 2024

dashpole commented Dec 5, 2024

bwplotka commented Dec 20, 2024

dashpole commented Dec 2, 2024 •

edited

Loading