Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Presto serde bug when deserializing large payload #11177

Closed
wants to merge 1 commit into from

Conversation

tanjialiang
Copy link
Contributor

@tanjialiang tanjialiang commented Oct 7, 2024

codec gives back chained iobufs when payload is very large. We only take care of unchained scenario and was able to get away with it. Fix the case for large payload and added unit test for it. The estimated buffer size to trigger this bug is around 80MB in GZIP, ZSTD and ZLIB

@tanjialiang tanjialiang requested a review from xiaoxmeng October 7, 2024 00:36
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 7, 2024
Copy link

netlify bot commented Oct 7, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 0451b14
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/670464fab150ba00074a63af

@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanjialiang good catch. The fix looks good % minor. Thanks!

auto uncompress =
codec->uncompress(compressBuf.get(), header.uncompressedSize);
ByteRange byteRange{
uncompress->writableData(), (int32_t)uncompress->length(), 0};
std::vector<ByteRange> byteRanges;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we have similar utility for this. Can you add a utility in velox/common/base/IOUtil.h for this with unit test? Like

std::vector<ByteRange> = writableByteRangesFromIOBuf?

@@ -4220,12 +4220,19 @@ void PrestoVectorSerde::deserialize(
auto compressBuf = folly::IOBuf::create(header.compressedSize);
source->readBytes(compressBuf->writableData(), header.compressedSize);
compressBuf->append(header.compressedSize);

// Process chained uncompressed results IOBufs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we never run into this issue in production? Or we haven't turn on compression? I think compression is at least enabled for spill.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single prod buffer is small enough to not trigger this bug

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the size to trigger this? Can you mention in PR description? Thanks!

@tanjialiang tanjialiang force-pushed the presto_serde branch 2 times, most recently from 96e4d46 to 1bef62b Compare October 7, 2024 17:40
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanjialiang thanks for the update % nits.

namespace facebook::velox::common {

std::vector<ByteRange> writableByteRangesFromIOBuf(folly::IOBuf* iobuf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just call it byteRangesFromIOBuf. It is used for input stream but byte range only take non-const pointer.


TEST_F(IOUtilsTest, iobufConsume) {
const uint64_t bufCapacity = 1024;
const uint64_t numChainedBuf = 64;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test empty, one and multiple buffer cases? Thanks!

@@ -4220,12 +4220,19 @@ void PrestoVectorSerde::deserialize(
auto compressBuf = folly::IOBuf::create(header.compressedSize);
source->readBytes(compressBuf->writableData(), header.compressedSize);
compressBuf->append(header.compressedSize);

// Process chained uncompressed results IOBufs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the size to trigger this? Can you mention in PR description? Thanks!

@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

tanjialiang added a commit to tanjialiang/velox-1 that referenced this pull request Oct 7, 2024
…tor#11177)

Summary:
codec gives back chained iobufs when payload is very large. We only take care of unchained scenario and was able to get away with it. Fix the case for large payload and added unit test for it. The estimated buffer size to trigger this bug is around 80MB in GZIP, ZSTD and ZLIB


Reviewed By: xiaoxmeng

Differential Revision: D63962642

Pulled By: tanjialiang
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63962642

@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@tanjialiang merged this pull request in 22d73a0.

Copy link

Conbench analyzed the 1 benchmark run on commit 22d73a05.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

athmaja-n pushed a commit to athmaja-n/velox that referenced this pull request Jan 10, 2025
…tor#11177)

Summary:
codec gives back chained iobufs when payload is very large. We only take care of unchained scenario and was able to get away with it. Fix the case for large payload and added unit test for it. The estimated buffer size to trigger this bug is around 80MB in GZIP, ZSTD and ZLIB

Pull Request resolved: facebookincubator#11177

Reviewed By: xiaoxmeng

Differential Revision: D63962642

Pulled By: tanjialiang

fbshipit-source-id: cbaf2bb5518de786c69461b5f8d725732c9f6fe8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants