Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metric: windowed histograms and polling discrepancy #98621

Closed
aadityasondhi opened this issue Mar 14, 2023 · 2 comments
Closed

metric: windowed histograms and polling discrepancy #98621

aadityasondhi opened this issue Mar 14, 2023 · 2 comments
Assignees
Labels
A-observability-inf C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@aadityasondhi
Copy link
Collaborator

aadityasondhi commented Mar 14, 2023

Describe the problem

Our current windowed histogram implementation does not match its intended purpose. Ideally, we want it to capture the sample data in the polling interval of its only user which is tsdb. This should result in every datapoint in tsdb having its data collected for the interval [T-t, T] where T is the current time, and t is the time of last poll.

Currently, we have windows that are set to 60s by default and are configurable. This means that when tsdb polls the histogram for data it gets data in an interval that could be any duration from 0-60s. This would mean that every subsequent poll calculates the moving average over a period that was already polled and then after a tick, it resets that histogram, potentially losing values that came after the previous poll.

Reference discussion in #98266.

Proposed Solution

From #98266:

Lift the windowing logic into the poller. This makes sure that each value that we store into tsdb is of the window [T-t, T], where t is the last poll time and T is the current time.

Jira issue: CRDB-25368

@aadityasondhi
Copy link
Collaborator Author

Draft PR: #101947

aadityasondhi added a commit to aadityasondhi/cockroach that referenced this issue May 23, 2023
Prior to this change, the metric.Histogram maintained two seperate
histograms under the hood: cumulative and windowed. The reason for the
windowed histogram was to be able to export the most recent recordings
to TSDB. However, the windowed histogram's ticks were not synchronized
with the polling of metrics done by TSDB. This caused problems where the
TSDB polling would read an empty window and report gaps in metrics as
seen in: cockroachdb#98266.

This patch seperates the windowed logic from pkg/metric and makes TSDB
polling keep track of deltas using a singular cumulative histogram.

Fixes cockroachdb#98266
Fixes cockroachdb#98621

Release note: None
@aadityasondhi
Copy link
Collaborator Author

Closing based on:
#103814 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-observability-inf C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant