-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for Native Histogram Text Format #32
Conversation
Create an initial prosal for supporting native histograms in the text format. It mainly links to the design doc for ease of comments while we decide on a path forward. After a proposal from the design doc has some support the rest of the markdown proposal will be filled out. Signed-off-by: Chris Marchbanks <[email protected]>
Signed-off-by: Chris Marchbanks <[email protected]>
894def9
to
e06d3d7
Compare
I have updated this proposal with the option from the design doc that gained consensus, and described the proposed format in specific detail. All reviews here are now welcome :) |
e06d3d7
to
0ef8516
Compare
Signed-off-by: Chris Marchbanks <[email protected]>
0ef8516
to
b1283f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey 👋
Do I understand correctly that the breaking change here is because you are using the same metric type as the classic histogram, but a different format for the exposed measurements?
If we're moving forward with OpenMetrics 2.0, we might want to align different workforces that are happening towards that, e.g. different format for created timestamps
I think this could be done with a minor version bump of OpenMetrics, but if it requires going to 2.0 then yes we would coordinate with the other changes. Until then it would be behind feature flags and we would need a version like version=1.1.0-alpha.1 for content negotiation with clients. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to approve the PR to give you the green light to experiment with things, but as mentioned in the dev summit I don't have the confidence to approve changes to OM.
My initial concerns are:
- Possible breaking change that requires OM 2.0 - You're using
# TYPE <metric name> histogram
followed by a line that has a different format from the current histogram format. Current scrapers will break when seeing this.
We could use # TYPE <metric name> native_histogram
instead, and change it back to histogram
in 2.0, if the scraper silently ignores unknown metric types, but I'm almost certain that's not the current behavior 😬
- What feature flag are you planning to release this under? The existing
native-histogram
flag? A new one?
Reasonable concerns! I added a section on backwards compatibility/versioning with the reason I believe this change to be backwards-compatible and releasable as part of OpenMetrics 1.1. Does that section help? As far as feature flags go, I think reusing the same native-histogram flag would be the best option (plus content negotiation). I am ok with a new one as well though if others would like to see a new flag. That decision could probably happen at PR review time? |
dce5799
to
8e19761
Compare
Signed-off-by: Chris Marchbanks <[email protected]>
8e19761
to
f80fafa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Especially the clarifying part about the meaning of forward and backward compatibility in OM makes me happy.
I have just a few optional suggestions for additions.
Let's get a few more thumbs up on this and then approve.
I'm pretty sure that OM is fairly strict with this. While you are free to implement a tolerant parser, it is not required. A fully OM 1.0 compliant parser may (and maybe even should) complain about any unknown metric types. A behavior like "new metric types maybe added, and old parsers should just ignore them" would need to be clearly specified in the spec. Cf. how protobuf is doing it. The explanation of how forward and backward compatibility works as it is part of this proposal IMHO makes a lot of sense (and I wish it would have been explained that way in the OM spec). To paraphrase: A minor version bump is sufficient if a v1.1 scraper can read a v1.0 exposition without any precaution, i.e. it doesn't even have to know that it is reading v1.0 and not v1.1. It is OK and expected that a v1.0 scraper will be unable to parse a v1.1 exposition. The linked explanation of content negotiation directly from the OM spec supports that: The scraper can ask for v1.1, but it might just get v1.0 back, which is fine. If on the other hand the scraper asks for v1.0, it must not get v1.1 back, but the producer must be capable of generating valid v1.0 and serve it as such (or, I guess, return an error).
Since this is just about the protocol, and there are many possible implementers (scrapers as well as producers), I won't discuss feature flags (which are specific to selected implementers, like in this case the Prometheus server). However, it might be good to sketch out a plan how to version the implementation as long as there is no official OM v1.1. (Something like v1.1-pre or v1.1-alpha? Not sure how those things are handled in network protocols generally.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do it :)
0627868
to
a3463fa
Compare
I added an option to the doc, happy to iterate on this now or later. It's always possible to amend these proposals as the work becomes more concrete. |
The significant changes are: 1. Add additional examples. 2. Add a proposal for how content negotiation will be done until OM 1.1 is released. 3. Plural form of arrays in the schema. 4. Added support for multiple exemplars as a non-goal. Signed-off-by: Chris Marchbanks <[email protected]>
a3463fa
to
2920fb2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, easy to convert to internal data. I added some minor comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM - modulo versioning. Added some suggestions though. Good work!
``` | ||
Accept: application/openmetrics-text;version=1.1.0-nativehistogram.*,application/openmetrics-text;version=1.0.0,text/plain;version=0.0.4 | ||
``` | ||
would mean the consumer can accept any nativehistogram enabled pre-release version of OpenMetrics 1.1.0, the base 1.0.0 version of OpenMetrics, or the 0.0.4 version of the classic Prometheus text format. Producers must include the proper content type for their version, such as the first nativehistogram pre-release: | ||
``` | ||
Content-Type: application/openmetrics-text;version=1.1.0-nativehistogram.0 | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this is interesting. What's "release identifier? I assume you want to version SPEC and native histogram feature separately? Why? Can we make it simpler?
I think I would expect something like:
Accept: application/openmetrics-text;version=1.1.0;feature=nativehistogram,application/openmetrics-text;version=1.0.0,text/plain;version=0.0.4
.. then we could simply do 1.1.1 or 1.2.0 if you have significant histogram change. Let's not create sub versions of every feature, this will make it bit painful to use and maintain. Speaking from experience when we evaled different options for remote write v2: prometheus/docs#2462
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://semver.org/#spec-item-9, it is just a pre-release, IMO if version=1.1.0
is there without any sort of prerelease suffix native histograms must be fully supported and there shouldn't be extra feature tags. I would also be ok with something like alpha.0
, alpha.1
, etc... if we don't want to specifically include nativehistogram in the version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to put some thoughts into versioning. The current proposal does not cover all of native histograms, Exemplars, created timestamps are out-of-scope.
Now, how will content negotiation work if we add these in the future? How will a client library know whether to include a list of exemplars in native histogram text format or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would opt for some way of opting into the feature as an experimental one, without any changes to the standard yet. Based on the experience made, we can make the cut what should land in v1.1 later (including maybe other changes being discussed). In other words: I don't think this design doc has to exactly describe v1.1 already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy to implement some other versioning scheme, the important part to me is that any exposers add some sort of pre-release identifier until OM 1.1 is actually released, and force ingesters to specify that they are ok with a pre-release version. A features section alongside a pre-release version seems reasonable to me, then becomes unnecessary once 1.1 is released. That allows us some flexibility to make breaking changes to the unreleased features. E.g. if we change around where the list of exemplars are stored, or choose to use a less compact format after feedback.
Content negotiation for features added in 1.2, 1.3, etc... will require client libraries to keep track of the features added in each minor version and leave them out if the requested version is lower. That could mean fall all the way back to 1.0, or maybe have support for a few versions (1.0, 1.1, 1.4) and choose the highest version less than or equal to what is requested by the ingester. E.g. if Prometheus requests 1.3, the client would return 1.1 in that example. 1.0 is the only version that all producers MUST support.
Also note that created timestamps have always been in scope (no change from current text format), and I just added a proposal for exemplar support.
Extend the OpenMetrics text format to allow structured values instead of only float values. This structured value will be used to encode a structure with the same fields as is exposed using the [protobuf exposition format](https://github.com/prometheus/client_model/blob/master/io/prometheus/client/metrics.proto). Starting with examples and then breaking up the format: | ||
``` | ||
# TYPE nativehistogram histogram | ||
nativehistogram {count:24,sum:100,schema:0,zero_threshold:0.001,zero_count:4,positive_spans:[0:2,1:2],negative_spans:[0:2,1:2],positive_deltas:[2,1,-2,3],negative_deltas:[2,1,-2,3]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we establish some simple forward compatibility rules for this JSON e.g. Parsers MUST ignore unknown fields? Would help in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be ok with that, but I am also not sure it is necessary with content negotiation as new fields would require part of a new version of OpenMetrics. I.e. I am unsure if we want to impose the level of strictness a given parser needs to implement, and it might be ok for different parsers to make different choices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW: It's not JSON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I like a way of adding stuff that old parsers will just ignore, this concept doesn't exist in OM so far, so it might be weird to introduce it in a sub-set of the protocol. Let's maybe solve that more generally in OM v2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks. Let's make it clear it's not a JSON somewhere please 🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any suggestions as to how to make it more clear that it is not JSON? In the format definition I have "The value for each series of a native histogram is a custom struct format" and I specifically compare it to JSON with the phrase "Compared to JSON this avoids consistently repeating keys and curly braces" after the type definitions.
* Tightened up some language of how to tell classic and native histograms apart. * Added that whitespace is not allowed inside of a value. Signed-off-by: Chris Marchbanks <[email protected]>
202deb7
to
cec73ef
Compare
Illustrate that ordering matters for this new format to aid parsing. Signed-off-by: Chris Marchbanks <[email protected]>
Provide a paragraph describing support for multiple exemplars as part of this proposal. Currently it is left ambiguous if a list of exemplars can be included on more than native histograms, we can define that explicitly when writing the new OpenMetrics spec. Signed-off-by: Chris Marchbanks <[email protected]>
4c65b52
to
7af9827
Compare
Explicitly state that changing created timestamp semantics and allowing custom buckets as part of the native histogram structure are not goals for this proposal. Signed-off-by: Chris Marchbanks <[email protected]>
I have added this as an agenda item for a dev summit. It seems like there are a couple to be decided points still, mainly around minor or major version bump. Hopefully having the sync discussion will allow us to come to a conclusion. |
We had a few consensus items at the dev summit that I think will allow this to move forward:
Looking forward to seeing this becoming a reality! |
Talked offline and the recent updates have alleviated concerns.
🎉 With the consensus items from the dev summit, are there any objections to merging this as is? We can continue to tweak it as development work proceeds. If I don't hear any objections I will plan to merge sometime next week. |
I think the only somewhat open point is how to mark pre-release features in content negotiation. At the same time, I think that's an aspect where we don't have to be terribly descriptive and precise in the first place, because it is just for pre-releases and precisely not for the stable standard. @csmarchbanks rightfully pointed to https://semver.org/#spec-item-9 , which essentially allow almost anything after the hyphen, and what's proposed here in the doc is semver compliant and also makes sense to me. As a design doc is not describing the final implementation, we have some wiggle room here, so I would propose to merge this as-is and see how it shakes out in the implementation. |
Wonder, what's relation of this to OpenMetrics 2.0? 🤔 Do we plan to change both Prometheus text AND OM 2.0 with this? |
I don't think we should change the Prometheus text format, if we can avoid it. As we concluded above, this could be part of OM 1.1. If it is part of OM 1.1, it should also be part of OM 2.0. If OM 2.0 happens soon, we might never have an OM 1.1, then this would only go into OM 2.0. |
FYI: prometheus/OpenMetrics#279 - I propose this should be in 2.0, no 1.1 is planned at the moment. |
Create an initial prosal for supporting native histograms in the text format. It mainly links to the design doc for ease of comments while we decide on a path forward. After a proposal from the design doc has some support the rest of the markdown proposal will be filled out.
Leaving the PR as a draft until the markdown is ready to be published, but comments here or in the Google doc are welcome.