-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add stats collection to memory reclaimer #6895
Conversation
✅ Deploy Preview for meta-velox canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tanjialiang LGTM. Thanks!
struct Stats { | ||
/// The total number of times of the reclaim attempts that end up failing | ||
/// due to reclaiming at non-reclaimable stage. | ||
std::atomic<uint64_t> numNonReclaimableAttempts{0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reclaim is single thread executed so we can just use uint64_t.
@@ -200,5 +200,6 @@ class SharedArbitrator : public MemoryArbitrator { | |||
tsan_atomic<uint64_t> numShrunkBytes_{0}; | |||
tsan_atomic<uint64_t> numReclaimedBytes_{0}; | |||
tsan_atomic<uint64_t> reclaimTimeUs_{0}; | |||
MemoryReclaimer::Stats reclaimerStats_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tsan_atomic<uint64_t> nonReclaimableAttempts_{0};
@@ -404,7 +404,7 @@ uint64_t SharedArbitrator::reclaim( | |||
try { | |||
freedBytes = pool->shrink(targetBytes); | |||
if (freedBytes < targetBytes) { | |||
pool->reclaim(targetBytes - freedBytes); | |||
pool->reclaim(targetBytes - freedBytes, reclaimerStats_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can pass the stats as a local variable here?
52ca239
to
6a581b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tanjialiang thanks for the update!
@@ -252,6 +256,12 @@ FOLLY_ALWAYS_INLINE std::ostream& operator<<( | |||
/// through techniques such as disks spilling. | |||
class MemoryReclaimer { | |||
public: | |||
struct Stats { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Used to collect memory reclaim execution stats.
ASSERT_EQ(pool->reclaim(kMaxMemory), 0); | ||
ASSERT_EQ(pool->reclaim(0, stats_), 0); | ||
ASSERT_EQ(pool->reclaim(100, stats_), 0); | ||
ASSERT_EQ(pool->reclaim(kMaxMemory, stats_), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check the stats_ are all zero?
@@ -591,15 +595,16 @@ TEST_F(MemoryReclaimerTest, orderedReclaim) { | |||
// child and 4 from 4th child and 1 from 2nd. | |||
// So expected reclaimable allocation units are {0, 0, 2, *0*, *0*} | |||
ASSERT_EQ( | |||
root->reclaimer()->reclaim(root.get(), 10 * allocUnitBytes), | |||
root->reclaimer()->reclaim(root.get(), 10 * allocUnitBytes, stats_), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a fake test here to verify the stats has been bumped and aggregated? Thanks!
allocation1.pool = op->pool(); | ||
allocation1.size = 16 * MB; | ||
|
||
TestAllocation ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return TestAllocation{}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You didn't change?
9f328d6
to
bdb72fe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tanjialiang thanks for the update!
@@ -243,6 +244,17 @@ void MemoryReclaimer::abort(MemoryPool* pool, const std::exception_ptr& error) { | |||
}); | |||
} | |||
|
|||
bool MemoryReclaimer::Stats::operator==( | |||
const MemoryReclaimer::Stats& other) const { | |||
return std::tie(numNonReclaimableAttempts) == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need for std::tie as there is only one element
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For later extension. But yeah we can just compare the single variable for now.
// The hash table itself in the grouping set is not cleared so it still | ||
// uses some memory. | ||
ASSERT_LT(op->pool()->currentBytes(), usedMemory); | ||
} else { | ||
VELOX_ASSERT_THROW( | ||
op->reclaim( | ||
folly::Random::oneIn(2) ? 0 : folly::Random::rand32(rng_)), | ||
folly::Random::oneIn(2) ? 0 : folly::Random::rand32(rng_), | ||
reclaimerStats_), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add check stats for these tests to improve coverage? thanks!
@@ -216,6 +216,7 @@ class OrderByTest : public OperatorTestBase { | |||
} | |||
|
|||
folly::Random::DefaultGenerator rng_; | |||
memory::MemoryReclaimer::Stats reclaimerStats_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dittos
@@ -845,6 +845,7 @@ class HashJoinTest : public HiveConnectorTestBase { | |||
RowTypePtr probeType_; | |||
RowTypePtr buildType_; | |||
|
|||
memory::MemoryReclaimer::Stats reclaimerStats_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
allocation1.pool = op->pool(); | ||
allocation1.size = 16 * MB; | ||
|
||
TestAllocation ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You didn't change?
cfcb787
to
99e204a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tanjialiang LGTM. We could consider to add actual reclaim time and reclaim wait time in this stats to tell how much time spent on spilling and how much time spent on waiting for task to pause.
@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tanjialiang thanks for the update!
velox/exec/tests/HashJoinTest.cpp
Outdated
@@ -5538,6 +5557,7 @@ DEBUG_ONLY_TEST_F(HashJoinTest, reclaimDuringWaitForProbe) { | |||
task.reset(); | |||
|
|||
taskThread.join(); | |||
ASSERT_EQ(reclaimerStats_, memory::MemoryReclaimer::Stats{}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider to provide empty() in followup. Thanks!
99e204a
to
56a5ecd
Compare
@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
56a5ecd
to
88a7f41
Compare
@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@tanjialiang merged this pull request in 89a9eb1. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary: During reclaiming there are information that also needs to be collected. Adding the stats collection framework to facilitate reclaiming stats collection. Pull Request resolved: facebookincubator#6895 Reviewed By: xiaoxmeng Differential Revision: D50020233 Pulled By: tanjialiang fbshipit-source-id: 9ebd78e9b0bfdf26652fd7fa9a452a2ffaf93869
During reclaiming there are information that also needs to be collected. Adding the stats collection framework to facilitate reclaiming stats collection.