-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-38438: [C++] Dataset: Trying to fix the async bug in Parquet dataset #38466
Conversation
|
@pitrou @bkietz I've tried to fix #38438 here Also cc @austin3dickey . I've reproduce this bug in my local-machine with ASAN, and fix it with this changing. I'm not so familiar with building Python, you can try to build an verify it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll give this a more thorough review tomorrow, for now:
ad5d79b
to
843c575
Compare
@ursabot please benchmark name=dataset-serialize lang=Python |
Benchmark runs are scheduled for commit 843c575. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add your reproducer #38438 (comment) as a test?
My code is far from the point we meet the segment fault. I build a Scanner and call Head to it multiple times. Let me use Parquet to re-produce it. |
@bkietz For testing. How can I mock a stream which This test need issue an IO in RowGroup RecordBatch Generator and just return without requiring the value. And when IO complete, the io thread call the callback, which will trigger the segment fault. ( This is triggered because @austin3dickey uses |
Yes, this pattern is used elsewhere in testing too.
Just adding the regression test to ensure future refactorings don't reintroduce the segfault should be sufficient |
The benchmark passed, which is exactly what we're looking for. Thanks! (The notification hasn't been posted here yet because the baseline commit hasn't finished its own benchmarks yet) |
Aha the regression test using a large local file |
Do we have some way to break the stack 😅 I've re-produce the case but asan didn't report error.
When finish IO task, |
class DelayedBufferReader : public ::arrow::io::BufferReader {
public:
explicit DelayedBufferReader(const std::shared_ptr<::arrow::Buffer>& buffer)
: ::arrow::io::BufferReader(buffer) {}
::arrow::Future<std::shared_ptr<Buffer>> ReadAsync(const ::arrow::io::IOContext& io_context, int64_t position,
int64_t nbytes) override {
read_async_count.fetch_add(1);
std::this_thread::sleep_for(std::chrono::seconds(10));
return ::arrow::io::BufferReader::ReadAsync(io_context, position, nbytes);
}
std::atomic<int> read_async_count{0};
};
TEST(TestArrowReadWrite, ScanContentsGracefulShutdown) {
ArrowReaderProperties properties = default_arrow_reader_properties();
properties.set_batch_size(256);
properties.set_pre_buffer(true);
properties.set_use_threads(true);
auto cache_options = ::arrow::io::CacheOptions::LazyDefaults();
cache_options.hole_size_limit = 1;
cache_options.range_size_limit = 1;
properties.set_cache_options(cache_options);
const int num_rows = 1024;
const int row_group_size = 16;
const int num_columns = 1;
std::shared_ptr<Table> table;
ASSERT_NO_FATAL_FAILURE(MakeDoubleTable(num_columns, num_rows, 1, &table));
std::shared_ptr<Buffer> buffer;
ASSERT_NO_FATAL_FAILURE(WriteTableToBuffer(table, row_group_size,
default_arrow_writer_properties(), &buffer));
auto mock_input_stream = std::make_shared<DelayedBufferReader>(buffer);
std::vector<std::unique_ptr<::arrow::DelayedExecutor>> delayed_executors;
delayed_executors.resize(3);
// vector of futures
std::vector<::arrow::Future<std::shared_ptr<::arrow::RecordBatch>>> futures;
for (int idx = 0; idx < 3; ++idx) {
delayed_executors[idx] = std::make_unique<::arrow::DelayedExecutor>();
auto& delayed_executor = *delayed_executors[idx];
std::shared_ptr<FileReader> reader;
{
std::unique_ptr<FileReader> unique_reader;
FileReaderBuilder builder;
ASSERT_OK(builder.Open(mock_input_stream));
ASSERT_OK(builder.properties(properties)->Build(&unique_reader));
reader = std::move(unique_reader);
}
{
ASSERT_OK_AND_ASSIGN(
auto batch_generator,
reader->GetRecordBatchGenerator(reader, {0, 1}, {0}, &delayed_executor, 256));
auto fut1 = batch_generator();
auto fut2 = batch_generator();
futures.push_back(fut1);
futures.push_back(fut2);
}
// clear reader.
reader = nullptr;
}
auto pool = ::arrow::internal::ThreadPool::Make(2).ValueOrDie();
pool->Submit([&]() {
for (int idx = 0; idx < 3; ++idx) {
auto& delayed_executor = *delayed_executors[idx];
while (!delayed_executor.captured_tasks.empty()) {
std::cout << "count:" << delayed_executor.captured_tasks.size() << std::endl;
auto callbacks = std::move(delayed_executor.captured_tasks);
for (auto& callback : callbacks) {
std::move(callback)();
}
}
}
}).ValueOrDie().Wait();
} I've tried an ugly test like this, but though |
Thanks for your patience. Conbench analyzed the 1 benchmarking run that has been run so far on PR commit 843c575. There was 1 benchmark result indicating a performance regression:
The full Conbench report has more details. |
Also cannot re-produce by:
|
@bkietz I've tried but cannot reproduce it, but currently failed to reproduce without the sample file...Need some help here 😭 |
Sadly I've trigger another problem when trying to re-produce the bug. 😅
@bkietz When we have too many Future, the done might notify too many "done", which might cause stack overflow... |
class DelayedBufferReader : public ::arrow::io::BufferReader {
public:
explicit DelayedBufferReader(const std::shared_ptr<::arrow::Buffer>& buffer)
: ::arrow::io::BufferReader(buffer) {}
::arrow::Future<std::shared_ptr<Buffer>> ReadAsync(const ::arrow::io::IOContext& io_context, int64_t position,
int64_t nbytes) override {
read_async_count.fetch_add(1);
auto self = std::dynamic_pointer_cast<DelayedBufferReader>(shared_from_this());
return DeferNotOk(::arrow::io::internal::SubmitIO(
io_context, [self, position, nbytes]() -> Result<std::shared_ptr<Buffer>> {
std::this_thread::sleep_for(std::chrono::seconds(1));
return self->DoReadAt(position, nbytes);
}));
}
std::atomic<int> read_async_count{0};
};
TEST_F(TestParquetFileFormat, MultithreadedScanUnsafe) {
auto reader = MakeGeneratedRecordBatch(schema({field("utf8", utf8())}), 10000, 100);
ASSERT_OK_AND_ASSIGN(auto buffer, ParquetFormatHelper::Write(reader.get()));
std::vector<Future<>> completes;
std::vector<std::shared_ptr<arrow::internal::ThreadPool>> pools;
for (int idx = 0; idx < 3; ++idx) {
auto buffer_reader = std::make_shared<DelayedBufferReader>(buffer);
auto source = std::make_shared<FileSource>(buffer_reader, buffer->size());
auto fragment = MakeFragment(*source);
std::shared_ptr<Scanner> scanner;
{
auto options = std::make_shared<ScanOptions>();
ASSERT_OK_AND_ASSIGN(auto thread_pool, arrow::internal::ThreadPool::Make(1));
pools.emplace_back(thread_pool);
options->io_context = ::arrow::io::IOContext(::arrow::default_memory_pool(), pools.back().get());
auto fragment_scan_options = std::make_shared<ParquetFragmentScanOptions>();
fragment_scan_options->arrow_reader_properties->set_pre_buffer(true);
options->fragment_scan_options = fragment_scan_options;
options->use_threads = true;
ScannerBuilder builder(ArithmeticDatasetFixture::schema(), fragment, options);
ASSERT_OK(builder.UseThreads(true));
ASSERT_OK(builder.BatchSize(10000));
ASSERT_OK_AND_ASSIGN(scanner, builder.Finish());
}
ASSERT_OK_AND_ASSIGN(auto batch, scanner->Head(10000));
auto fut = scanner->ScanBatchesUnorderedAsync();
// ASSERT_OK_AND_ASSIGN(batch, scanner->Head(8000 * 10));
// Random ReadAsync calls
for (int yy = 0; yy < 16; yy++) {
completes.emplace_back(buffer_reader->ReadAsync(::arrow::io::IOContext(), 0, 1001));
}
scanner = nullptr;
}
for (auto& f: completes) {
f.Wait();
}
} Finally I re-produce the issue successfully with same stack I meet, should I use this @bkietz ? (this test might runs for a long time...) |
std::vector<Future<>> completes; | ||
std::vector<std::shared_ptr<arrow::internal::ThreadPool>> pools; | ||
|
||
for (int idx = 0; idx < 2; ++idx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using 2 to make the state machine a bit complex, also make we use another Executor
options->use_threads = true; | ||
ScannerBuilder builder(ArithmeticDatasetFixture::schema(), fragment, options); | ||
|
||
ASSERT_OK(builder.UseThreads(true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thread is neccessary for testing
{ | ||
auto options = std::make_shared<ScanOptions>(); | ||
ASSERT_OK_AND_ASSIGN(auto thread_pool, arrow::internal::ThreadPool::Make(1)); | ||
pools.emplace_back(thread_pool); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pool is for case mentioned in the description
This is ready for review now :-| |
3223950
to
a93ea70
Compare
Thank you for continuing to work on this, @mapleFU ! I'll review soon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @mapleFU !
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 951d92a. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 12 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…et (#38466) ### Rationale for this change Origin mentioned #38438 1. When PreBuffer is default enabled, the code in `RowGroupGenerator::FetchNext` would switch to async mode. This make the state handling more complex 2. In `RowGroupGenerator::FetchNext`, `[this]` is captured without `shared_from_this`. This is not bad, however, `this->executor_` may point to a invalid address if this dtor. This patch also fixes a lifetime issue I founded in CSV handling. ### What changes are included in this PR? 1. Fix handling in `cpp/src/parquet/arrow/reader.cc` as I talked above 2. Fix a lifetime problem in CSV ### Are these changes tested? I test it locality. But don't know how to write unittest here. Fell free to help. ### Are there any user-facing changes? Bugfix * Closes: #38438 Authored-by: mwish <[email protected]> Signed-off-by: Benjamin Kietzman <[email protected]>
… dataset (apache#38466) ### Rationale for this change Origin mentioned apache#38438 1. When PreBuffer is default enabled, the code in `RowGroupGenerator::FetchNext` would switch to async mode. This make the state handling more complex 2. In `RowGroupGenerator::FetchNext`, `[this]` is captured without `shared_from_this`. This is not bad, however, `this->executor_` may point to a invalid address if this dtor. This patch also fixes a lifetime issue I founded in CSV handling. ### What changes are included in this PR? 1. Fix handling in `cpp/src/parquet/arrow/reader.cc` as I talked above 2. Fix a lifetime problem in CSV ### Are these changes tested? I test it locality. But don't know how to write unittest here. Fell free to help. ### Are there any user-facing changes? Bugfix * Closes: apache#38438 Authored-by: mwish <[email protected]> Signed-off-by: Benjamin Kietzman <[email protected]>
Rationale for this change
Origin mentioned #38438
RowGroupGenerator::FetchNext
would switch to async mode. This make the state handling more complexRowGroupGenerator::FetchNext
,[this]
is captured withoutshared_from_this
. This is not bad, however,this->executor_
may point to a invalid address if this dtor.This patch also fixes a lifetime issue I founded in CSV handling.
What changes are included in this PR?
cpp/src/parquet/arrow/reader.cc
as I talked aboveAre these changes tested?
I test it locality. But don't know how to write unittest here. Fell free to help.
Are there any user-facing changes?
Bugfix
pyarrow.dataset.write_dataset
with dataset source read with pre_buffer=True #38438