-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose metrics related to memory arbitration #7940
Expose metrics related to memory arbitration #7940
Conversation
✅ Deploy Preview for meta-velox canceled.
|
@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bikramSingh91 LGTM % minors. Thanks!
velox/common/base/Counters.cpp
Outdated
DEFINE_METRIC( | ||
kMetricArbitratorFailuresCount, facebook::velox::StatType::COUNT); | ||
|
||
// Tracks the arbitration request queue times in range of [0, 100s] and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to increase the queue time range later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
increased to match that of reclaim time since they should roughly be on the same order of magnitude
|
||
// Tracks the arbitration run times in range of [0, 100s] and reports P50, | ||
// P90, P99, and P100. | ||
DEFINE_HISTOGRAM_METRIC( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
increased to match that of reclaim time since they should roughly be on the same order of magnitude
velox/dwio/common/SortingWriter.cpp
Outdated
@@ -94,6 +96,7 @@ uint64_t SortingWriter::reclaim( | |||
} | |||
|
|||
if (!isRunning()) { | |||
RECORD_METRIC_VALUE(kMetricMemoryNonReclaimableCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this. As for now, kMetricMemoryNonReclaimableCount tracks the reclaim in non-reclaimable section caused by insufficient memory reservation? thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification. I have added context to the meaning of this metric to both the metrics.rst doc and the cpp file declaring this metric. I do however feel that we would want to rename this to make it more explicit that it only counts for those kind of non-reclaimable failures. Please let me know what you think.
d8e94b3
to
31fb677
Compare
This adds the following stats: "velox.arbitrator_requests_count" "velox.arbitrator_aborted_count" "velox.arbitrator_failures_count" "velox.arbitrator_queue_time_ms" "velox.arbitrator_arbitration_time_ms" "velox.arbitrator_free_capacity_bytes" Also fixed accounting for: "velox.memory_non_reclaimable_count"
31fb677
to
e40e02b
Compare
@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@bikramSingh91 merged this pull request in fa82b94. |
Conbench analyzed the 1 benchmark run on commit There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. |
This adds the following stats:
"velox.arbitrator_requests_count"
"velox.arbitrator_aborted_count"
"velox.arbitrator_failures_count"
"velox.arbitrator_queue_time_ms"
"velox.arbitrator_arbitration_time_ms"
"velox.arbitrator_free_capacity_bytes"
Also fixed accounting for:
"velox.memory_non_reclaimable_count"