Skip to content

Commit

Permalink
De-flake MultiFragmentTest::exchangeStatsOnFailure (facebookincubator…
Browse files Browse the repository at this point in the history
…#11094)

Summary:

The test is flaky for two independent reasons:
a) The producer task is writing a large payload may take more than 3sec (timeout value) to finish. The solution is decrease the payload. 

b) The second reason for the flakiness is a bug in ExchangeClient. ExchangeClient::close() was not correctly closing all the ExchangeSources (it keeps shared_ptr of ExchangeSource in a queue). As a result, Task::Close()-->ExchangeClient::close() doesn't cleanup the memory if there is another shared_ptr of same ExchangeClient is alive. Another ExchangeClient ptr can be alive as  ExchangeClient::request() passes shared_from_this() to a future.

Reviewed By: xiaoxmeng

Differential Revision: D63374267
  • Loading branch information
pansatadru authored and facebook-github-bot committed Oct 3, 2024
1 parent 8369142 commit 2ee7f30
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 1 deletion.
4 changes: 4 additions & 0 deletions velox/exec/ExchangeClient.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -70,13 +70,17 @@ void ExchangeClient::noMoreRemoteTasks() {

void ExchangeClient::close() {
std::vector<std::shared_ptr<ExchangeSource>> sources;
std::queue<ProducingSource> producingSources;
std::queue<std::shared_ptr<ExchangeSource>> emptySources;
{
std::lock_guard<std::mutex> l(queue_->mutex());
if (closed_) {
return;
}
closed_ = true;
sources = std::move(sources_);
producingSources = std::move(producingSources_);
emptySources = std::move(emptySources_);
}

// Outside of mutex.
Expand Down
2 changes: 1 addition & 1 deletion velox/exec/tests/MultiFragmentTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1891,7 +1891,7 @@ DEBUG_ONLY_TEST_F(MultiFragmentTest, exchangeStatsOnFailure) {
});

auto producerPlan = PlanBuilder()
.values({data}, false, 100)
.values({data}, false, 30)
.partitionedOutput({}, 1)
.planNode();

Expand Down

0 comments on commit 2ee7f30

Please sign in to comment.