-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use new compaction interval metric in compaction failed alert. #293
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Callum Styan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR conflicts with #294. We should have already covered the case in #294 and actually ,due to how compactor works and cortex_compactor_last_successful_run_timestamp_seconds
is updated, it may take longer than 2x compaction interval to complete a compaction run (and the compactor may be perfectly fine anyway).
Can we go back to using |
Yeah, my understanding is that we can't do this in PromQL right now. @beorn7 would be able to confirm. |
Are you talking about using the value of a metric rather than the duration literal If that's the case, then the answer is: no, that's currently not possible in PromQL. I do think we should make that possible, but it's one of those "not as easy as it looks" problems. Right now, the PromQL engine can find out what time range the query will access in the TSDB, just from static analysis of the query. With allowing the value coming from another expression, you now have to evaluate that "inner query" first to find out what time range the "outer query" will need to access. For reference: My brainstorming doc about timestamps and durations: https://docs.google.com/document/d/1jMeDsLvDfO92Qnry_JLAXalvMRzMSB1sBr9V7LolpYM/edit#heading=h.vmb7pe7hp12 |
May you confirm we can't query the last time (seconds) a metric has
increased, right?
…On Wed, Apr 21, 2021 at 11:19 PM Björn Rabenstein ***@***.***> wrote:
Are you talking about using the value of a metric rather than the duration
literal 2h in [2h]?
If that's the case, then the answer is: no, that's currently not possible
in PromQL. I do think we should make that possible, but it's one of those
"not as easy as it looks" problems. Right now, the PromQL engine can find
out what time range the query will access in the TSDB, just from static
analysis of the query. With allowing the value coming from another
expression, you now have to evaluate that "inner query" first to find out
what time range the "outer query" will need to access.
For reference: My brainstorming doc about timestamps and durations:
https://docs.google.com/document/d/1jMeDsLvDfO92Qnry_JLAXalvMRzMSB1sBr9V7LolpYM/edit#heading=h.vmb7pe7hp12
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#293 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAM7QEDKH24XWAEMR6UBJOTTJ46N3ANCNFSM43JGNJQQ>
.
|
I believe there is no straightforward way to do this within PromQL. But perhaps an expert can prove me wrong here? There is the |
Signed-off-by: Callum Styan [email protected]
What this PR does:
Changes compaction failed alert to use new
cortex_compactor_compaction_interval_seconds
metric, so that we can more accurately alert when two compactions in a row have failed, regardless of whether the compaction interval is 2h (the value currently used within the increase function of the alert) or not.Not sure if there should be a changelog entry here.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]