Skip to content

Commit

Permalink
Scatter struct nulls when deserializing Presto wire format (facebooki…
Browse files Browse the repository at this point in the history
…ncubator#8318)

Summary:
When reading spill serialization, struct nulls are written before the struct columns and the reading can proceed i a single pass.

Like this, nulls from enclosing structs are passed down when reading. These are combined ith nulls of the contained column so that the contained column also has a null for rows where the enclosing struct is null.

When reading Presto Pages, struct nulls come after the child columns. A separate pass scatters the child column values so as to create a null gap for the rows where the containing struct is null.

Adds a test for encoding preserving roud trips. Adds a test for concatenating different encodings in a message, e.g. constant, dictionary, flat in all combinations of same/different encoding/value domain. This functionality only applies to nulls first representations. This will apply to Presto pages when the struct nulls are read before constructing the struct. See PR 8152 for the end state.

Pull Request resolved: facebookincubator#8318

Reviewed By: xiaoxmeng

Differential Revision: D52682198

Pulled By: oerling

fbshipit-source-id: 4253727392ecae2caca92e79799710703370a287
  • Loading branch information
Orri Erling authored and facebook-github-bot committed Jan 19, 2024
1 parent 3154f7a commit 2645613
Show file tree
Hide file tree
Showing 4 changed files with 525 additions and 250 deletions.
4 changes: 2 additions & 2 deletions velox/exec/SpillFile.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ uint64_t SpillWriter::write(
MicrosecondTimer timer(&timeUs);
if (batch_ == nullptr) {
serializer::presto::PrestoVectorSerde::PrestoOptions options = {
kDefaultUseLosslessTimestamp, compressionKind_};
kDefaultUseLosslessTimestamp, compressionKind_, true};
batch_ = std::make_unique<VectorStreamGroup>(pool_);
batch_->createStreamTree(
std::static_pointer_cast<const RowType>(rows->type()),
Expand Down Expand Up @@ -297,7 +297,7 @@ SpillReadFile::SpillReadFile(
numSortKeys_(numSortKeys),
sortCompareFlags_(sortCompareFlags),
compressionKind_(compressionKind),
readOptions_{kDefaultUseLosslessTimestamp, compressionKind_},
readOptions_{kDefaultUseLosslessTimestamp, compressionKind_, true},
pool_(pool) {
constexpr uint64_t kMaxReadBufferSize =
(1 << 20) - AlignedBuffer::kPaddedSize; // 1MB - padding.
Expand Down
Loading

0 comments on commit 2645613

Please sign in to comment.