Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-40036: [C++] Azure file system write buffering & async writes #43096

Merged
merged 14 commits into from
Aug 21, 2024

Conversation

OliLay
Copy link
Contributor

@OliLay OliLay commented Jul 1, 2024

Rationale for this change

See #40036.

What changes are included in this PR?

Write buffering and async writes (similar to what the S3 file system does) in the ObjectAppendStream for the Azure file system.

With write buffering and async writes, the input scenario creation runtime in the tests (which uses the ObjectAppendStream against Azurite) decreased from ~25s (see here) to ~800ms:

[ RUN      ] TestAzuriteFileSystem.OpenInputFileMixedReadVsReadAt
[       OK ] TestAzuriteFileSystem.OpenInputFileMixedReadVsReadAt (787 ms)

Are these changes tested?

Added some tests with background writes enabled and disabled (some were taken from the S3 tests). Everything changed should be covered.

Are there any user-facing changes?

AzureOptions now allows for background_writes to be set (default: true). No breaking changes.

Notes

  • The code in DoWrite is very similar to the code in the S3 FS. Maybe this could be unified? I didn't see this in the scope of the PR though.

Copy link

github-actions bot commented Jul 1, 2024

⚠️ GitHub issue #40036 has been automatically assigned in GitHub to PR creator.

@OliLay
Copy link
Contributor Author

OliLay commented Jul 18, 2024

Hi @kou, sorry for directly pinging you again, but do you maybe know who would be an appropriate person to look at this PR? Same for #43098

@kou
Copy link
Member

kou commented Jul 19, 2024

No problem. Sorry for not reviewing this and #43098.
@Tom-Newton @felipecrv and I implemented Azure filesystem. So one of us are appropriate person to ping.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in DoWrite is very similar to the code in the S3 FS. Maybe this could be unified? I didn't see this in the scope of the PR though.

It's better if it improve maintainability. We can work on it as a follow-up task.

}

Future<> FlushAsync() {
RETURN_NOT_OK(CheckClosed("flush"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move

RETURN_NOT_OK(CheckClosed("flush"));
if (!initialised_) {
// If the stream has not been successfully initialized then there is nothing to
// flush. This also avoids some unhandled errors when flushing in the destructor.
return Status::OK();
from Flush()?
It seems that we don't need to execute CheckClosed("flush") and if (!initialized_) in both of Flush() and FlushAsync().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this to Flush, since this is the public API.

Copy link
Contributor

@Tom-Newton Tom-Newton Aug 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with @kou. Also the S3 filesystem implements Flush() as

  Status Flush() override {
    auto fut = FlushAsync();
    return fut.status();
  }

I think it would be nice if we could do the same because it ensures that Flush and FlushAsync have the same behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to unify the sync/async flush implementations, unfortunately that did not work out due to lifetime issues in the async case. When an ObjectAppendStream is deconstructed in RAII way, Close() (and therefore Flush()) is called. If we call FlushAsync() in the close, we need to create a shared_ptr from this (to ensure lifetime of this when the lambda is actually called), but: we can not create a shared_ptr of this while it is deconstructed. Hence in the Close() call we always must do a sync Flush where we do not have to give these lifetime guarantees.
TLDR: Flush() and FlushAsync() impls are similar, but slightly different and decoupled implementations.

Comment on lines 1131 to 1180
auto advance_ptr = [&data_ptr, &nbytes](const int64_t offset) {
data_ptr += offset;
nbytes -= offset;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about updating pos_ and content_length_ too in this and improve variable name for the change?

std::shared_ptr<Buffer> buffer;
do {
ASSERT_OK_AND_ASSIGN(buffer, input->Read(128 * 1024));
ASSERT_TRUE(buffer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it asserts that reading the input in fact worked (because that is also using the underlying Azure file system implementation). But we can also get rid of it, I don't have a strong opinion on it but I also don't see any harm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Let's remove it. The previous ASSERT_OK_AND_ASSIGN(buffer) must detect any invalid situations.

auto data = SetUpPreexistingData();
const auto path = data.ContainerPath("test-write-object");
ASSERT_OK_AND_ASSIGN(auto output, fs->OpenOutputStream(path, {}));
std::array<std::int64_t, 3> sizes{2570 * 1024, 258 * 1024, 259 * 1024};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment why we should use these sizes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These sizes were kind of arbitrary (I did not introduce them), I changed to more reasonable sizes. The rationale of the test is just to issue different sizes of writes that trigger different mechanisms (e.g. buffering, directly uploading, etc.).

std::string(sizes[2], 'C'),
};
auto expected = std::int64_t{0};
for (auto i = 0; i != 3; ++i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use i < buffers.size() here?

Comment on lines 1529 to 1534
std::shared_ptr<io::OutputStream> stream;
auto data = SetUpPreexistingData();
const std::string path = data.ContainerPath("test-write-object");
constexpr auto payload = PreexistingData::kLoremIpsum;

ASSERT_OK_AND_ASSIGN(stream, fs->OpenOutputStream(path));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::shared_ptr<io::OutputStream> stream;
auto data = SetUpPreexistingData();
const std::string path = data.ContainerPath("test-write-object");
constexpr auto payload = PreexistingData::kLoremIpsum;
ASSERT_OK_AND_ASSIGN(stream, fs->OpenOutputStream(path));
auto data = SetUpPreexistingData();
const std::string path = data.ContainerPath("test-write-object");
constexpr auto payload = PreexistingData::kLoremIpsum;
ASSERT_OK_AND_ASSIGN(auto stream, fs->OpenOutputStream(path));

ASSERT_OK_AND_ASSIGN(stream, fs->OpenOutputStream(path));
ASSERT_OK(stream->Write(payload));
// Destructor implicitly closes stream and completes the multipart upload.
// GH-37670: Testing it doesn't matter whether flush is triggered asynchronously
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that GH-37670 is for S3 filesystem.
Is this true for Azure filesystem too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, not the exact wording, as we don't use a "multipart upload" but we also trigger the upload of the last data, if any, and then commit it. So there is also some I/O when the stream is closed which we need to make sure we wait on. I'll clarified the comment and removed the ref to the GitHub issue.

Comment on lines 1550 to 1555
std::shared_ptr<io::OutputStream> stream;
constexpr auto* payload = "new data";
auto data = SetUpPreexistingData();
const std::string path = data.ContainerPath("test-write-object");

ASSERT_OK_AND_ASSIGN(stream, fs->OpenOutputStream(path));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::shared_ptr<io::OutputStream> stream;
constexpr auto* payload = "new data";
auto data = SetUpPreexistingData();
const std::string path = data.ContainerPath("test-write-object");
ASSERT_OK_AND_ASSIGN(stream, fs->OpenOutputStream(path));
constexpr auto* payload = "new data";
auto data = SetUpPreexistingData();
const std::string path = data.ContainerPath("test-write-object");
ASSERT_OK_AND_ASSIGN(auto stream, fs->OpenOutputStream(path));

Comment on lines 1166 to 1167
ARROW_ASSIGN_OR_RAISE(current_block_, io::BufferOutputStream::Create(
kBlockUploadSize, io_context_.pool()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse Reset()-ed current_block_ instead of creating a new one?

cpp/src/arrow/filesystem/azurefs.h Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting review Awaiting review awaiting changes Awaiting changes labels Jul 19, 2024
@OliLay
Copy link
Contributor Author

OliLay commented Jul 19, 2024

The code in DoWrite is very similar to the code in the S3 FS. Maybe this could be unified? I didn't see this in the scope of the PR though.

It's better if it improve maintainability. We can work on it as a follow-up task.

Yeah, I just have to admit that I haven't come up with a good idea to factor this out and provide a good abstraction on top. Because this functionality may be needed by any file system that implements write buffering.

@OliLay OliLay requested review from kou and felipecrv July 19, 2024 07:55
cpp/src/arrow/filesystem/azurefs.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/azurefs.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/azurefs.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/azurefs.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels Jul 19, 2024
@OliLay OliLay requested a review from felipecrv July 22, 2024 07:37
Copy link
Contributor

@felipecrv felipecrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a complicated change (at least for me) and I don't currently have time to give a full review.

cpp/src/arrow/filesystem/azurefs.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/azurefs.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/azurefs.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels Jul 30, 2024
@OliLay OliLay requested a review from felipecrv August 1, 2024 07:08
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Aug 1, 2024
@pitrou
Copy link
Member

pitrou commented Aug 21, 2024

A more detailed leak trace (obtained using LSAN_OPTIONS=fast_unwind_on_malloc=0:malloc_context_size=100) is the following:

Direct leak of 968 byte(s) in 1 object(s) allocated from:
    #0 0x5600b66641e8 in calloc (/build/cpp/debug/arrow-azurefs-test+0x4c31e8) (BuildId: 5f129cb231bb2651f5fb432f4c8d3c2f40506da3)
    #1 0x7f85322b25a4  (/lib/x86_64-linux-gnu/libxml2.so.2+0xe25a4) (BuildId: aebf8e42966c3ce475ff9d9d51a762831adcbb61)
    #2 0x7f85322a5fb4 in __xmlDefaultBufferSize (/lib/x86_64-linux-gnu/libxml2.so.2+0xd5fb4) (BuildId: aebf8e42966c3ce475ff9d9d51a762831adcbb61)
    #3 0x7f8532240d37 in xmlBufferCreate (/lib/x86_64-linux-gnu/libxml2.so.2+0x70d37) (BuildId: aebf8e42966c3ce475ff9d9d51a762831adcbb61)
    #4 0x7f8547ec6f74 in Azure::Storage::_internal::XmlWriter::XmlWriter() /build/cpp/_deps/azure_sdk-src/sdk/storage/azure-storage-common/src/xml_wrapper.cpp:532:19
    #5 0x7f8547e87621 in Azure::Storage::Blobs::_detail::BlockBlobClient::CommitBlockList(Azure::Core::Http::_internal::HttpPipeline&, Azure::Core::Url const&, Azure::Storage::Blobs::_detail::BlockBlobClient::CommitBlockBlobBlockListOptions const&, Azure::Core::Context const&) /build/cpp/_deps/azure_sdk-src/sdk/storage/azure-storage-blobs/src/rest_client.cpp:7944:30
    #6 0x7f8547de3149 in Azure::Storage::Blobs::BlockBlobClient::CommitBlockList(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, Azure::Storage::Blobs::CommitBlockListOptions const&, Azure::Core::Context const&) const /build/cpp/_deps/azure_sdk-src/sdk/storage/azure-storage-blobs/src/block_blob_client.cpp:512:12
    #7 0x7f854618c14c in arrow::fs::(anonymous namespace)::CommitBlockList(std::shared_ptr<Azure::Storage::Blobs::BlockBlobClient>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, Azure::Storage::Blobs::CommitBlockListOptions const&) /arrow/cpp/src/arrow/filesystem/azurefs.cc:948:24
    #8 0x7f8546193aab in arrow::fs::(anonymous namespace)::ObjectAppendStream::FlushAsync()::'lambda'()::operator()() const /arrow/cpp/src/arrow/filesystem/azurefs.cc:1158:14
    #9 0x7f85461933eb in std::enable_if<((!(std::is_void<arrow::Status>::value)) && (!(is_future<arrow::Status>::value))) && ((!(arrow::Future<arrow::internal::Empty>::is_empty)) || (std::is_same<arrow::Status, arrow::Status>::value)), void>::type arrow::detail::ContinueFuture::operator()<arrow::fs::(anonymous namespace)::ObjectAppendStream::FlushAsync()::'lambda'(), arrow::Status, arrow::Future<arrow::internal::Empty> >(arrow::Future<arrow::internal::Empty>, arrow::fs::(anonymous namespace)::ObjectAppendStream::FlushAsync()::'lambda'()&&) const /arrow/cpp/src/arrow/util/future.h:150:23

@pitrou
Copy link
Member

pitrou commented Aug 21, 2024

The leak manifests in TestAzuriteFileSystem.OpenOutputStreamCloseAsync and TestAzuriteFileSystem.OpenOutputStreamAsyncDestructor. However, TestAzuriteFileSystem.OpenOutputStreamAsyncDestructorNoBackgroundWrites is fine.

@OliLay
Copy link
Contributor Author

OliLay commented Aug 21, 2024

Interesting, I don't really see where this could come from actually. From the tests that are affected, for me it looks like that only the path when we go through CloseAsync()/FlushAsync() is the issue. The sync and async implementations behave very similar so I don't really see where this could come from, especially since we don't really control the Azure allocations in any way (seeing that the XmlBuffer is apparently the problematic allocation?)
In fact, everything Azure related should be deallocated at latest in CloseAsync() when the BlockBlobClient should get destructed because we remove the shared ptr ref.

@pitrou
Copy link
Member

pitrou commented Aug 21, 2024

@OliLay There's already a skip related to a libxml2 leak with threaded operation, so I added another one.

This might actually be fixed by Azure/azure-sdk-for-cpp#5767 which was recently merged in Azure C++.

Also cc @felipecrv

@pitrou
Copy link
Member

pitrou commented Aug 21, 2024

Actually, no, we may need something like this hack from ClickHouse:
https://github.com/ClickHouse/ClickHouse/blob/054b38d4ebeabc33010fb532766f937af2d44c03/programs/server/Server.cpp#L987-L998

It was added in this PR: ClickHouse/ClickHouse#45796

@pitrou
Copy link
Member

pitrou commented Aug 21, 2024

@github-actions crossbow submit -g cpp

This comment was marked as outdated.

@pitrou pitrou force-pushed the azure-async-writes branch from ed47d2c to d337e09 Compare August 21, 2024 10:48
@pitrou
Copy link
Member

pitrou commented Aug 21, 2024

Hmm, I rebased for CI.

@pitrou
Copy link
Member

pitrou commented Aug 21, 2024

@github-actions crossbow submit -g cpp

Copy link

Revision: d337e09

Submitted crossbow builds: ursacomputing/crossbow @ actions-0a8f77add1

Task Status
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-20.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-20.04-cpp-thread-sanitizer GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions

@pitrou
Copy link
Member

pitrou commented Aug 21, 2024

macOS build failure is unrelated as it happened elsewhere: https://github.com/apache/arrow/actions/runs/10481257647/job/29030430952

@pitrou pitrou merged commit e1e7c50 into apache:main Aug 21, 2024
37 of 39 checks passed
@pitrou pitrou removed the awaiting change review Awaiting change review label Aug 21, 2024
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit e1e7c50.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants