-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add serialization API for set of rows #7883
Conversation
✅ Deploy Preview for meta-velox canceled.
|
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
auto nulls = nullsHolder.get(bits::nwords(numRows)); | ||
simd::gatherBits(rawNulls, rows, nulls); | ||
auto nonNulls = nonNullsHolder.get(numRows); | ||
const auto numNonNull = simd::indicesOfSetBits(nulls, 0, numRows, nonNulls); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment that we expect mostly set bits so prefer materializing indices to maximize instruction level parallelism
auto size = serializer->maxSerializedSize(); | ||
LOG(INFO) << "Size=" << size << " estimate=" << sizeEstimate << " " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove logging or turn it into an assertion
@oerling There are also some tests failing in the internal builds |
Edited it.
From: Jimmy Lu ***@***.***>
Sent: Friday, December 8, 2023 7:36 AM
To: facebookincubator/velox ***@***.***>
Cc: oerling ***@***.***>; Mention ***@***.***>
Subject: Re: [facebookincubator/velox] Add serialization API for set of rows (PR #7883)
@Yuhta commented on this pull request.
_____
In velox/serializers/PrestoSerializer.cpp <#7883 (comment)> :
+ uint8_t nullsByte = *reinterpret_cast<const uint8_t*>(nulls);
+ numNonNull = __builtin_popcount(nullsByte);
+ nonNullIndices =
+ numNonNull == numRows ? nullptr : simd::byteSetBits(nullsByte);
+ } else {
+ auto mutableIndices = nonNullHolder.get(numRows);
+ numNonNull = simd::indicesOfSetBits(nulls, 0, numRows, mutableIndices);
+ nonNullIndices = numNonNull == numRows ? nullptr : mutableIndices;
+ }
+ stream->appendNulls(nulls, 0, rows.size(), numNonNull);
+ ByteStream& out = stream->values();
+
+ if constexpr (sizeof(T) == 8) {
+ AppendWindow<int64_t> window(out, scratch);
+ int64_t* output = window.get(numNonNull);
+ if (numNonNull == numRows) {
Remove this 2 lines
_____
In velox/serializers/PrestoSerializer.cpp <#7883 (comment)> :
+template <TypeKind Kind>
+void estimateFlatSerializedSize(
+ const BaseVector* vector,
+ const folly::Range<const vector_size_t*>& rows,
+ vector_size_t** sizes,
+ Scratch& scratch) {
+ const auto valueSize = vector->type()->cppSizeInBytes();
+ const auto numRows = rows.size();
+ if (vector->mayHaveNulls()) {
+ auto rawNulls = vector->rawNulls();
+ ScratchPtr<uint64_t, 4> nullsHolder(scratch);
+ ScratchPtr<int32_t, 64> nonNullsHolder(scratch);
+ auto nulls = nullsHolder.get(bits::nwords(numRows));
+ simd::gatherBits(rawNulls, rows, nulls);
+ auto nonNulls = nonNullsHolder.get(numRows);
+ const auto numNonNull = simd::indicesOfSetBits(nulls, 0, numRows, nonNulls);
Add a comment that we expect mostly set bits so prefer materializing indices to maximize instruction level parallelism
_____
In velox/serializers/tests/PrestoSerializerTest.cpp <#7883 (comment)> :
auto size = serializer->maxSerializedSize();
+ LOG(INFO) << "Size=" << size << " estimate=" << sizeEstimate << " "
Remove logging or turn it into an assertion
_____
In velox/serializers/tests/PrestoSerializerTest.cpp <#7883 (comment)> :
+ uint64_t irTime{0};
+ uint64_t rrTime{0};
+
+ std::string toString() {
+ return fmt::format(
+ "{} of {} {} bit {}%null: {} ir / {} rr",
+ numSelected,
+ vectorSize,
+ bits,
+ nullPct,
+ irTime,
+ rrTime);
+ }
+};
+
+TEST_P(PrestoSerializerTest, timeFlat) {
We probably should split this into its own benchmark file, there is no validation or dependency on the rest of the file.
—
Reply to this email directly, view it on GitHub <#7883 (review)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKPPPT666J6H74WF34SVMF3YIMXU3AVCNFSM6AAAAABAIEX4C6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTONZSGUZDOMBRG4> .
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Top level rows are more efficiently serialized in row sets rather than arrays or ranges. Arrays of ranges are still useful for repeated nested content. The row set path can uses SIMD to gather nulls and extract idices of non-null values for serialization. A Scratch objett is added to signatures to pass reusable scratch memory also for top level calls to range serializing serializatin functions. This can remove malloc use for temporary vectors. The API is tested standalone but is not connected to running code, so this diff does not affect running systens.
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Top level rows are more efficiently serialized in row sets rather than arrays or ranges. Arrays of ranges are still useful for repeated nested content. The row set path can uses SIMD to gather nulls and extract idices of non-null values for serialization.
A Scratch objett is added to signatures to pass reusable scratch memory also for top level calls to range serializing serializatin functions. This can remove malloc use for temporary vectors.
The API is tested standalone but is not connected to running code, so this diff does not affect running systens.