Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/util/metric: period spike in alloc_objects from prometheus.Histogram metrics on poll #113508

Closed
abarganier opened this issue Oct 31, 2023 · 1 comment

Comments

@abarganier
Copy link
Contributor

abarganier commented Oct 31, 2023

Describe the problem

To start, please note that this issue has not led to any cluster/node instability. It is an observation of a periodic increase in allocated objects on the heap, which could be a good candidate for optimization.

About a year ago, work was done to migrate our histogram libraries away from the HDR histogram, to instead be backed by Prometheus histograms (see #85990). In short, this was done because the HDR histograms generated hundreds of histogram buckets, which strained downstream systems such as Prometheus. The Prometheus-backed histogram uses a static number of buckets.

More recently, a bug was found with the windowed Prometheus-backed histograms causing periodic gaps in reported data (see #98266). This was fixed in #104088, which updated the logic which merged histogram windows.

It was recently discovered that the migration from HDR to Prometheus-backed histograms increased the number of heap allocated objects. This is triggered every 10 seconds, when the internal TSDB polls all metrics registries to write the latest values to TSDB (it also happens periodically to generate the NodeStatus report). These polls of the metric data require a data transformation from prometheus.Histogram (used in our internal representation) to the prometheusgo.Metric type. It's the call to (*prometheus.Histogram).Write(prometheusgo.Metric) in this process that triggers the allocations - something that happens within the vendored library, not our own code.

This originally appeared when the migration from HDR to Prometheus first happened. However, the issue was exasperated with the fixes made to how histogram windows are merged, which added an additional call to (*prometheus.Histogram).Write(prometheusgo.Metric), effectively doubling the number of allocated objects.

The offending code is here - note the two calls to .Write():

cur := &prometheusgo.Metric{}
prev := &prometheusgo.Metric{}
if err := h.windowed.cur.Write(cur); err != nil {
panic(err)
}
if h.windowed.prev != nil {
if err := h.windowed.prev.Write(prev); err != nil {
panic(err)
}
MergeWindowedHistogram(cur.Histogram, prev.Histogram)
}

In the extreme case, some heap profiles we looked at where the profile is timed correctly to overlap with the TSDB polling that happens every 10s, these (*histogram).Write() calls accounted for nearly 30% of cumulative allocated objects on the heap. For example:

$ for i in $(ls ./memprof*.pprof |  sort -R | head -n 30); do go tool pprof -top -alloc_objects -nodecount=1 "$i" | tail -n2 ; done
      flat  flat%   sum%        cum   cum%
13237168657 27.51% 27.51% 13237168657 27.51%  github.com/prometheus/client_golang/prometheus.(*histogram).Write
      flat  flat%   sum%        cum   cum%
 591896349 30.39% 30.39%  591896349 30.39%  github.com/prometheus/client_golang/prometheus.(*histogram).Write
      flat  flat%   sum%        cum   cum%
12157289072 27.69% 27.69% 12157289072 27.69%  github.com/prometheus/client_golang/prometheus.(*histogram).Write
      flat  flat%   sum%        cum   cum%
13236705516 27.51% 27.51% 13236705516 27.51%  github.com/prometheus/client_golang/prometheus.(*histogram).Write
      flat  flat%   sum%        cum   cum%
11887529425 27.76% 27.76% 11887529425 27.76%  github.com/prometheus/client_golang/prometheus.(*histogram).Write
      flat  flat%   sum%        cum   cum%
  46995371 24.05% 24.05%   46995371 24.05%  github.com/prometheus/client_golang/prometheus.(*histogram).Write
      flat  flat%   sum%        cum   cum%
 319519281 28.70% 28.70%  319519281 28.70%  github.com/prometheus/client_golang/prometheus.(*histogram).Write
      flat  flat%   sum%        cum   cum%
 594184736 30.40% 30.40%  594184736 30.40%  github.com/prometheus/client_golang/prometheus.(*histogram).Write
      flat  flat%   sum%        cum   cum%
12968362073 27.90% 27.90% 12968362073 27.90%  github.com/prometheus/client_golang/prometheus.(*histogram).Write

While the allocations happen within a vendored library, there is still potential for optimization here. We can for example rewrite the .Write() function ourselves to be more conservative with allocations.

Jira issue: CRDB-33029

@abarganier abarganier added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-observability-inf labels Oct 31, 2023
@abarganier abarganier removed the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Oct 31, 2023
@abarganier abarganier changed the title pkg/util/metric: period spike in alloc_objects from prometheus.Histogram metrics pkg/util/metric: period spike in alloc_objects from prometheus.Histogram metrics on poll Oct 31, 2023
@abarganier
Copy link
Contributor Author

This was fixed as part of #116581

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant