Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stats collection to memory reclaimer #6895

Closed

Conversation

tanjialiang
Copy link
Contributor

During reclaiming there are information that also needs to be collected. Adding the stats collection framework to facilitate reclaiming stats collection.

@netlify
Copy link

netlify bot commented Oct 4, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 88a7f41
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6520fa38acb813000886e187

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 4, 2023
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanjialiang LGTM. Thanks!

struct Stats {
/// The total number of times of the reclaim attempts that end up failing
/// due to reclaiming at non-reclaimable stage.
std::atomic<uint64_t> numNonReclaimableAttempts{0};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reclaim is single thread executed so we can just use uint64_t.

@@ -200,5 +200,6 @@ class SharedArbitrator : public MemoryArbitrator {
tsan_atomic<uint64_t> numShrunkBytes_{0};
tsan_atomic<uint64_t> numReclaimedBytes_{0};
tsan_atomic<uint64_t> reclaimTimeUs_{0};
MemoryReclaimer::Stats reclaimerStats_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tsan_atomic<uint64_t> nonReclaimableAttempts_{0};

@@ -404,7 +404,7 @@ uint64_t SharedArbitrator::reclaim(
try {
freedBytes = pool->shrink(targetBytes);
if (freedBytes < targetBytes) {
pool->reclaim(targetBytes - freedBytes);
pool->reclaim(targetBytes - freedBytes, reclaimerStats_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can pass the stats as a local variable here?

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanjialiang thanks for the update!

@@ -252,6 +256,12 @@ FOLLY_ALWAYS_INLINE std::ostream& operator<<(
/// through techniques such as disks spilling.
class MemoryReclaimer {
public:
struct Stats {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// Used to collect memory reclaim execution stats.

ASSERT_EQ(pool->reclaim(kMaxMemory), 0);
ASSERT_EQ(pool->reclaim(0, stats_), 0);
ASSERT_EQ(pool->reclaim(100, stats_), 0);
ASSERT_EQ(pool->reclaim(kMaxMemory, stats_), 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check the stats_ are all zero?

@@ -591,15 +595,16 @@ TEST_F(MemoryReclaimerTest, orderedReclaim) {
// child and 4 from 4th child and 1 from 2nd.
// So expected reclaimable allocation units are {0, 0, 2, *0*, *0*}
ASSERT_EQ(
root->reclaimer()->reclaim(root.get(), 10 * allocUnitBytes),
root->reclaimer()->reclaim(root.get(), 10 * allocUnitBytes, stats_),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a fake test here to verify the stats has been bumped and aggregated? Thanks!

velox/exec/OrderBy.h Show resolved Hide resolved
allocation1.pool = op->pool();
allocation1.size = 16 * MB;

TestAllocation ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return TestAllocation{}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't change?

@tanjialiang tanjialiang force-pushed the reclaim_stats branch 5 times, most recently from 9f328d6 to bdb72fe Compare October 6, 2023 01:51
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanjialiang thanks for the update!

@@ -243,6 +244,17 @@ void MemoryReclaimer::abort(MemoryPool* pool, const std::exception_ptr& error) {
});
}

bool MemoryReclaimer::Stats::operator==(
const MemoryReclaimer::Stats& other) const {
return std::tie(numNonReclaimableAttempts) ==
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need for std::tie as there is only one element

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For later extension. But yeah we can just compare the single variable for now.

// The hash table itself in the grouping set is not cleared so it still
// uses some memory.
ASSERT_LT(op->pool()->currentBytes(), usedMemory);
} else {
VELOX_ASSERT_THROW(
op->reclaim(
folly::Random::oneIn(2) ? 0 : folly::Random::rand32(rng_)),
folly::Random::oneIn(2) ? 0 : folly::Random::rand32(rng_),
reclaimerStats_),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add check stats for these tests to improve coverage? thanks!

@@ -216,6 +216,7 @@ class OrderByTest : public OperatorTestBase {
}

folly::Random::DefaultGenerator rng_;
memory::MemoryReclaimer::Stats reclaimerStats_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dittos

@@ -845,6 +845,7 @@ class HashJoinTest : public HiveConnectorTestBase {
RowTypePtr probeType_;
RowTypePtr buildType_;

memory::MemoryReclaimer::Stats reclaimerStats_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

allocation1.pool = op->pool();
allocation1.size = 16 * MB;

TestAllocation ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't change?

@tanjialiang tanjialiang force-pushed the reclaim_stats branch 2 times, most recently from cfcb787 to 99e204a Compare October 6, 2023 07:29
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanjialiang LGTM. We could consider to add actual reclaim time and reclaim wait time in this stats to tell how much time spent on spilling and how much time spent on waiting for task to pause.

@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanjialiang thanks for the update!

@@ -5538,6 +5557,7 @@ DEBUG_ONLY_TEST_F(HashJoinTest, reclaimDuringWaitForProbe) {
task.reset();

taskThread.join();
ASSERT_EQ(reclaimerStats_, memory::MemoryReclaimer::Stats{});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider to provide empty() in followup. Thanks!

@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@tanjialiang merged this pull request in 89a9eb1.

@conbench-facebook
Copy link

Conbench analyzed the 1 benchmark run on commit 89a9eb1f.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

ericyuliu pushed a commit to ericyuliu/velox that referenced this pull request Oct 12, 2023
Summary:
During reclaiming there are information that also needs to be collected. Adding the stats collection framework to facilitate reclaiming stats collection.

Pull Request resolved: facebookincubator#6895

Reviewed By: xiaoxmeng

Differential Revision: D50020233

Pulled By: tanjialiang

fbshipit-source-id: 9ebd78e9b0bfdf26652fd7fa9a452a2ffaf93869
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants