Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add proposal: Simplify joins with info metrics in PromQL #37
Add proposal: Simplify joins with info metrics in PromQL #37
Changes from 2 commits
72ce08e
38c2743
88feea2
7fd4887
d42a059
3f3b7f5
32707a8
7fdf82a
e265217
7c8f527
5cfeef3
ceead4a
e2ccdda
01692a6
c54c0a4
612b41b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"By default" the old metric is marked as stale as soon as the scrape happens that contains the new metric. The problem is that this doesn't work if ingesting via OTLP, which is obviously common in an OTel use case. Only in that case, you run into the problem of five minutes of duplication. (Which is the reason behind a minimal proposal to "fix" the problem (discussed elsewhere): Somehow make staleness handling work with OTLP ingestion.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may be interested in the current OTel Entities proposal, which is designed to carry non-identifying resource information: https://github.com/open-telemetry/oteps/blob/6d6febfaf05f130c703e2dd0fa91dfacada82a7d/text/entities/0256-entities-data-model.md. My hope would be that we can potentially map these to OM info metrics in the future.
edit: I forgot that I already left the same comment on the previous proposal docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dashpole yes, this point in fact refers to the proposed Entity model, since it will allow for non-identifying resource/entity attributes to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And metrics that are "info metrics in spirit only" (
kube_pod_labels
) aren't even covered at all in OM because it would be invalid in OM to mark a metric as info metric that doesn't end on_info
. (Quote from OM spec: "The Sample MetricName for the value of a MetricPoint for a MetricFamily of type Info MUST have the suffix '_info'.") I believe to unleash the full potential of info metrics, we need to relax this requirement in OM.Just saying this here for the record. We don't need to discuss it in this proposal (or maybe just as a footnote).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also worry about this being limited to OM, as most of the ecosystem seems to still be using prometheus text/protobuf formats today.
Did we consider to expanding this to also include gauge metrics with an
_info
suffix? Many OTel prometheus exporters still export target_info as a gauge due to lack of OM support, or because the Prometheus client doesn't support Info metrics.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even OM needs an amendment for its text version.
So the best outcome here is to "fix" OpenMetrics (including marking identifying labels in the text format and allowing info metrics that do not end on _info, but also a number of other things that have hampered adoption so far) so that it finally OM gets adopted everywhere.
The second best outcome is to also amend classic Prometheus exposition formats. (It's a bit weird because I'd rather see the effort of changing the exposition format to lead directly to OM everywhere rather than more versions of the many formats that we already have.)
In the meantime, I would go for scrape config options to mark metrics of whatever source and format as info metrics, including the option to mark certain labels as identifying labels. (The heuristic "target labels → identifying labels, other labels → date labels" should be fine in most cases.) A rule to say "all metrics ending on
_info
are info metrics" would be easy to configure. And then special cases could be configured to say "kube_pod_labels
coming from kube-state-metrics should be considered an info metric withnamespace
andpod
as the identifying labels. Etc.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could mention here that info metrics with additional identifying labels in the exposition format are rare. Usually, just the target labels added upon ingestion are the identifying labels, so this would be a good first guess. (Works for
build_info
, but doesn't work forkube_pod_labels
.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm adding the following sentence, does it work for you @beorn7?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the damage is much bigger -- we essentially use all labels as "identifying" causing massive churn and cardinality bomb. With this feature we essentially increase use of info metric, so that's perhaps relevant to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you could mention that there is room for a future iteration addressing these issues in various forms, one could think of info metrics with persistence or some other kind of metadata store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jesusvazquez should that perhaps be mentioned in the Goals section? Before revising the proposal after @codesome's feedback, solving these problems in Prometheus was part of the goals. My understanding of the "Pitfalls of the current solution" section is that it should be referring to status quo in Prometheus, before the proposed solution.
To be honest I'm a bit unsure of what to put in the Pitfalls section currently, until @codesome reviews so the final form of the proposal is clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the goal to also select certain metrics based on certain target_info non-identifying label?
e.g. would I be able to do something like
info(http_request{...}, {"go_version" = "1.22.3"}
, so filtering http_requests by label which only exists on target e.g. version of the Go that target was compiled with?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added more specification about the new TSDB API and its use of label matchers. If I understand you correctly, the answer is yes:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Then I wonder if simpler solution would be worth to discuss as an alternative e.g. special prefixed labels: https://cloud.google.com/stackdriver/docs/managed-prometheus/promql#metadata-labels 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will have a read, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading the link about metadata labels, as I understand Bartek, the idea is to perhaps use metadata like syntax instead of introducing an
info
function. I think the underlying TSDB API would be the same. Alternative syntax toinfo
is definitely on the table, also @jesusvazquez has thought about this.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the Stackdriver documentation correctly, metadata labels are, from the outside API/PromQL perspective equivalent to adding all OTel resource attributes (or version number from
build_info
or pod labels fromkube_pod_labels
) to all metrics as label values (just with a prefix so that you have some hope (not a strategy!) that no label name collisions will happen, and that users can recognize which labels are coming from metadata).Obviously, a backend can be implemented in a way to make that efficient, so that the efficiency concerns can be addressed. However, there are still concerns about UX, like every metric will now have a ginormous amount of labels, which needs additional tooling to get under control (like displaying metadata labels differently). It as a long-held Prometheus best practice to not add the same information (like the version number of the exposing binary) to every metric exposed, but bundle them in an info metric (like
build_info
), and the motivation for that was not just a concern about storage efficiency in the backend, but also UX.My understanding is that this whole proposal is an attempt to solve the problem in the "classic" Prometheus way by making info metrics more powerful and more usable. The proposal to add all of the "metadata" to every metric in form of metadata labels is still on the table, but it's a different proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing those insights @beorn7 :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar question as the data labels - if the identifying labels change in between let's say t1 (with t0 being a older time and t2 being a future time), do we use the old identifying labels for queries touching t0 to t1, and then new identifying labels for queries after t1? Let's document this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a change of identifying labels should really be seen as a completely independent new series, without any staleness marker insertion. As said, this should be rare (while the fallout of a change in the data labels in the absence of staleness markers, e.g. what you get when ingesting OTLP, triggered the whole effort). And it could actually mean the new info series is actually meant as a completely independent new series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@codesome yes, that is my current thinking at least. My thinking is to track through the info metric's samples, what is the identifying label set at any given point in time. Agree to document it properly, I'll give my proposal a re-read, to see whether I need to clarify further.
@beorn7 there shouldn't be any need for a staleness marker insertion, when the identifying label set changes, so long as the entire series' label set remains the same. My thinking is to just record a new sample for the info metric, that records the new set of identifying labels. Does that make sense to you as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably works (but again, I would almost see this as an implementation detail).
Staleness marker will be inserted anyway if you scrape Prometheus-style, and the problems you have because of staleness markers not happening with OTLP ingestion will also happen anyway. It's not just for info metrics, and I think we don't need a bespoke staleness-marker ingestion solution here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's say a time series starts at time t0, and if the data labels of the info metric changes at time t1, and t2 is a future time t2>t1. Then in this scenario, when I do a range query of info() function for time range t0 through t2, which data labels will be used? Will it use the old data labels between t0 and t1 and then new data labels between t1 to t2, or new data labels for t0 through t2? It will be good to document this because "stale" makes me feel new labels for everything but I think the right thing to do will be old labels for t0 to t1 (I think this is the plan, right? I forgot)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it should be always the relevant label for the respective timestamp. I guess the term "stale" comes from looking at a possible implementation detail, which is to insert a staleness marker for the "old series". However, if the identity of the info series is only defined by the identifying labels, then there is actually no new series. Only the existing series changes its value (where the value are the data labels). (This is the "logical view", if you want. I can imagine it is hard to implement that way directly in Prometheus. And I haven't looked at @aknuds1's PoC to check how it is done there.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if the prototype currently works this way, since I did some additional thinking when drafting relevant scenarios for
info
, but my current thinking is indeed that theinfo
function should pick the correct data labels on a sample by sample basis. Does that answer your question @codesome?@beorn7 I haven't changed the fact that there will be a new time series when the data labels change. I just use a head/block index to associate info metrics with their identifying label sets, so you can find different time series (as data labels change) for the same identifying labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to clarify whether
ensure backward compatibility with current Prometheus usage and minimize potential conflicts with existing metric labels
is part of the goal. especially when considering metadata labels as an alternative solution, this clarity will guide whether this approach is viable and if additional tools or documentation are needed to guide user managing such conflicts.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is good feedback. It's good to explicitly call out an implicit motivation behind the
info
function design, i.e. backwards compatibility (improving the existing solution of info metrics).Do you mean here that adding e.g. OTel resource attributes as labels could lead to conflicts?
I've added the following two points to goals:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to know how exactly 🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm trying to derive how to best model the info metric indexes in the scenarios scratch pad. I'm currently just focusing on the TSDB head, the block index is trickier to design due to being on disk.
I think I need some feedback on whether my idea of tracking changes to info metrics' identifying labels through a native sample type is a good idea. @beorn7 suggests that we might persist these info metrics completely outside of the TSDB, but I have the impression that would require reinventing the wheel wrt. querying the info metrics themselves (rather than just including their labels via the
info
function).I'm on holidays until the 28th of May. When I come back, I'll have to seek feedback on how to solve this particular design problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scratch that, in retrospect I noticed a footnote in the doc, where @beorn7 does say we need to keep storing info metric samples in TSDB to support querying:
@beorn7 my question is: Does it make sense to you to use the identifying label set as the info metric sample value? I.e., it would be a new info metric sample type, that also pretends to have a float value of 1 (backwards compatible behaviour). From my prototyping, this is what would make sense, otherwise you would need to model changes over time in the info metric index. It seems better to me to model this via the usual TSDB time series mechanism instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a detailed implementation idea here. You have probably most experience with this by now through your prototyping.
However, it feels like you wanted to say "Does it make sense to you to use the data label set as the info metric sample value?". The identifying labels are like "normal" labels, by which you want to find an info metric, and then you take the data labels from their sample value and add it to the metric you want to join with. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@beorn7 why do you think the data label set is better suited for the sample value? My reasoning is that for any given info metric, when you ingest a sample, the identifying label set may have changed, while the label set as a whole is unchanged. I.e., it's the same metric to Prometheus, but you need to track that the identifying label set has changed at that point in time. I should think you can instead use the data label set for the sample value, as it's just the inverse of the identifying label set. However, isn't it more natural to track that the identifying label set changes, as that's what declared over the ingestion protocol?
Maybe I should clarify that I currently just store the label indices in samples, not the strings, so the data label values would be extracted from the metric's label set. Not 100% sure if it's a good design choice, but that's how it's currently done. Make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I have understood the actual issue only now. Let me re-phrase: What you are discussing here is an info metric
foo{a="aa", b="bb", c="cc"}
whereb
andc
used to be data labels (anda
identifying labels), and then a change happens whereb
gets "promoted" to an identifying label, so nothing really changes from the point of view of the Prometheus data model is it exists right now.That's a bit different from what I understood first.
How to implement this is again a question that you can probably answer better than anyone by now. It depends how "thoroughly" the new data model will be implemented in PromQL.
From a pure "new data model" perspective, the info metric after the change is a different metric (because it has different identifying labels). The
info
function might now join it with different other metrics, and it will often add other labels to its output.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thus far chosen the "simplistic" route of not changing Prometheus' core time series identity model (so that when the identifying label set changes, as you describe above, it's the same time series according to Prometheus). Then, for the purpose of implementing the
info
function, the info metric's identifying label set is determined on a per timestamp basis (via the info metric's samples). I haven't implemented the latter, it's just design at the moment, but I think it should work. I think I will return to work some more on the prototype soon.You don't see any real problems with my proposed design, as I describe it above, I hope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot say at the moment.
In any case, changing which labels are identifying should be a rare thing to happen, so even if it isn't solved in the ideal way, it shouldn't hurt much.
This is very much different from the current state of a change in a target_info label that then creates duplicates because staleness isn't handled properly when ingesting OTLP. The latter happens all the time in regular operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! :) I'll try fixing up the prototype to match the design in the proposal, then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed up the prototype, so it matches the design in the proposal (info metric samples define the identifying label set).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand this example? I know the current rule for aggregating rates is Rate then sum, never sum then rate.
Now, will this turn into
rate then info then sum
, or could you alsoinfo then rate then sum
?Not sure if there is any specific reason to do
rate()
beforeinfo()
, butsum()
afterinfo()
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disclaimer: I just copied this example from Beorn's original document. I'll answer based on my knowledge of
info
though.rate(http_server_request_duration_seconds_count[2m])
becomes a sub-query to theinfo
function, which adds thek8s_cluster_name
label fromtarget_info
to thehttp_server_request_duration_seconds_count
time series resulting from the sub-query. Thesum
expression aggregates by the labelsk8s_cluster_name
andhttp_status_code
. If you don't applysum
afterinfo
, thek8s_cluster_name
label won't be available.As for whether you could do
info
beforerate
; I haven't tested, but I think it would work. However, it looks less logical/comprehensible to me, as you really want the rate ofhttp_server_request_duration_seconds_count[2m]
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aknuds1 already explained nicely why you have to do the
sum
last.In current PromQL, the order before the
sum
has to be range selector →rate
→info
.The range selector is a construct that only works directly with a metric, i.e.
info(http_server_request_duration_seconds_count)[2m]
doesn't even parse. (You could do a sub-query, i.e.info(http_server_request_duration_seconds_count)[2m:]
, but that's quite a different thing.)Neither can you apply
info
to a range selector:info(http_server_request_duration_seconds_count[2m])
doesn't even parse again, because the grammar knows thatinfo
applies to an instant vector, not a range vector. (And the concept of overloaded function, i.e. functions with different type signature but the same name, doesn't exist in PromQL. Within the current framework, you could only implement a separateinfo_range
function or something, but I hope we don't need that.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could describe what I alluded to above, i.e. "adding all resource attributes and build versions and pod labels and everything as prefixed labels" with its pros and cons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I quickly drafted a corresponding alternative, it could be expanded upon. PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might have been discussed before, but I wonder if we are missing the simplest solution of query rewriting as a considered alternative. Given the example query on line 119, can we rewrite it into a join and let the engine execute it with the existing semantics of PromQL. Information needed for the join can either be implicitly deducted or specified by the user as arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
info
function can't be simply reduced to syntactic sugar for joins though, for the following reasons at least:info
function automatically finds suitable info metrics to join with (the MVP only supportstarget_info
, but we aim to lift that limitation in the future)info
function automatically knows which are info metrics' identifying labels, i.e. which to join on (the MVP assumesinstance
andjob
, but we aim to lift that limitation in the future)info
function solves the problem of temporary join query conflicts between info metrics due to non-identifying labels changing, until the old version goes staleAssuming we were to use query rewriting, how would you deduce which info metric(s) to join against (when
info
is no longer limited to just thetarget_info
metric)? If the user, as in join queries, has to supply which info metric to join with and which labels to join on plus which labels to include, have we really gained much versus status quo?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can only see this working well for a static list of well known info metrics like..
target_info
or by specifying which info metrics to use expicitely. If i have multiple info metrics that match by accident and dont specify matchers I probably get confusing labels added. Thats not possible if i specify the info metric explicitely or constrain it to a well known list liketarget_info
. Prometheus is too flexible to make such convenience safe I think.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MichaHoffmann if that is your concern with the
info
function (and @fpetkovski's?), I think that should be an explicit criticism of the proposed design, rather than suggestinginfo
as proposed here can be simply reduced to syntactic sugar. If you think the design should be explicit about picking info metrics rather than automatic (although that can be filtered down via the__name__
label matcher), I think there should be an alternative proposal. I'm not sure though if it's most feasible to list competing proposals in the same document, or whether they should be separate.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, Sorry I didnt want to propose something differently, I was just concerning out loud. What I want to say is that the name label matcher for info metric probably should be requirement for this to be generally least surprising for the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MichaHoffmann, I can note your wish about the name label matcher being a requirement, so a consensus can be reached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should note that my original idea was indeed that there is generally a relatively low number of info metrics so that a default matching behavior of "all the info metrics known to the TSDB" is feasible, which is in fact one of the reasons that makes the "joining" easier.
Note that with a "general metadata store", you would probably also end up with "give me all the metadata you know for this metric", which is (more or less) equivalent to "give me all the info metrics that have an identifying labels match with this metric".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we do have an action plan now right? Something along the lines of implementing the info function without TSDB modifications and then iterate further with a new proposal addressing the issues listed above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll add some.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added two tasks, could you please take a look?