Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: track cache disk metrics separately #24138

Merged
merged 6 commits into from
Nov 19, 2024

Conversation

nvartolomei
Copy link
Contributor

@nvartolomei nvartolomei commented Nov 15, 2024

Additional test in #24140

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

Bug Fixes

  • Previously if redpanda was configured with different mountpoints for data and cache directory we would report metrics only for the cache directory. Now, the original storage_disk_{total,free}_bytes metric will report metrics for the data directory mountpoint and a new storage_cache_disk_{total,free}_bytes metric will report metrics for the cache directory mountpoint. Metrics will be equivalent if both are on the same mountpoint.

Previously if data and cache directories were mounted on different disk
then we would override the metrics with information only from the cache
disk.

This commit introduces a separate metric for the cache disk to fix the
bug and also allow greater visibility into the underlying system.

I considered introducing a label rather than a new metric name but that
would break backwards compatibility of the metrics since the cardinality
would change from 1 to 2 and could also break dashboards and our tests.
From the point-of-view of time series storage engines like prometheus
there is no difference in new metric name vs new label.

Fixes https://redpandadata.atlassian.net/browse/CORE-1609
Fixes redpanda-data#15223
Before

> ```
> storage space alert: free space at 61.734% on /var/lib/redpanda/data:
> 307.545GiB total, 189.861GiB free, min. free 0.000bytes. Please adjust
> retention policies as needed to allow writing again
> ```

After

> ```
> space alert: free space at 61.455% on /var/lib/redpanda/data: 307.545GiB
> total, 189.001GiB free, min. free for alert 0.000bytes, min. free for
> degraded 1024.000PiB. Please adjust retention policies as needed to
> allow writing again
> ```
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

human::bytes(disk.total), // NOLINT narrowing conv.
human::bytes(disk.free), // NOLINT " "
human::bytes(min_space), // NOLINT " "
human::bytes(disk.total), // NOLINT narrowing conv.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can silence specific with NOLINT(actual-name-of-check)

@nvartolomei nvartolomei merged commit 0f29250 into redpanda-data:dev Nov 19, 2024
17 checks passed
@nvartolomei
Copy link
Contributor Author

/backport v24.3.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants