-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit Dictionary to Single Level #10942
base: main
Are you sure you want to change the base?
Conversation
oerling
commented
Sep 6, 2024
- Modifies wrapInDictionary to flatten indices of a wrapped dictionary iwth the wrapping indices.
- Makes lazy loading of a dictionary encoded column to combine the indices with a dictionary wrapper if loading with lazy wrapped in a dictionary.
- Adds functions to transpose dictionaries with and without nulls.
- Changes NestedLoopJoin and MergeJoin so that they wrap their input only after the wrapping indices are known. Previously these would wrap first and only then fill in the indices.
- Checks that we do not come across multiple nested dictionaries.
✅ Deploy Preview for meta-velox canceled.
|
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
ddec93a
to
baae829
Compare
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
CMakeLists.txt
Outdated
@@ -508,8 +508,7 @@ if(${VELOX_BUILD_TESTING}) | |||
set_source(c-ares) | |||
resolve_dependency(c-ares) | |||
|
|||
set_source(gRPC) | |||
resolve_dependency(gRPC) | |||
# set_source(gRPC) resolve_dependency(gRPC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address comments and make sure CI is all green.
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oerling thanks for the update % minors. Thanks!
velox/vector/DecodedVector.cpp
Outdated
@@ -177,6 +179,10 @@ void DecodedVector::combineWrappers( | |||
setBaseData(*values, rows); | |||
return; | |||
case VectorEncoding::Simple::DICTIONARY: { | |||
if (!wasLazy) { | |||
// LOG(ERROR) << "Multilevel dict "; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this log
velox/vector/BaseVector.cpp
Outdated
@@ -1007,6 +1031,98 @@ std::string printIndices( | |||
return out.str(); | |||
} | |||
|
|||
// static | |||
void BaseVector::transposeIndices( | |||
const vector_size_t* base, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/base/baseIndices/
s/size/wrapSize/
s/indices/wrapIndices/
s/result/resultIndices/
velox/vector/BaseVector.h
Outdated
vector_size_t size, | ||
const vector_size_t* wrapIndices, | ||
const uint64_t* wrapNulls, | ||
vector_size_t* result, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/result/resultIndices/
velox/vector/BaseVector.cpp
Outdated
if (isLazyNotLoaded(*base)) { | ||
// It is OK to rewrap a lazy. It is an error to wrap a lazy in multiple | ||
// different dictionaries. | ||
base->containsLazyAndIsWrapped_ = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any check on this if possible? containsLazyAndIsWrapped_ can never be true?
velox/vector/BaseVector.cpp
Outdated
base->containsLazyAndIsWrapped_ = false; | ||
} | ||
auto* rawNulls = vector->rawNulls(); | ||
if (indices->refCount() > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
velox/vector/BaseVector.cpp
Outdated
@@ -154,6 +157,37 @@ VectorPtr BaseVector::wrapInDictionary( | |||
shouldFlatten = !isLazyNotLoaded(*base) && (base->size() / 8) > size; | |||
} | |||
|
|||
if (vector->encoding() == VectorEncoding::Simple::DICTIONARY) { | |||
auto base = vector->valueVector(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/base/baseValue/
velox/vector/BaseVector.cpp
Outdated
// different dictionaries. | ||
base->containsLazyAndIsWrapped_ = false; | ||
} | ||
auto* rawNulls = vector->rawNulls(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/rawNulls/rawBaseNulls/
- Modifies wrapInDictionary to flatten indices of a wrapped dictionary iwth the wrapping indices. - Makes lazy loading of a dictionary encoded column to combine the indices with a dictionary wrapper if loading with lazy wrapped in a dictionary. - Adds functions to transpose dictionaries with and without nulls. - Changes NestedLoopJoin and MergeJoin so that they wrap their input only after the wrapping indices are known. Previously these would wrap first and only then fill in the indices. - Checks that we do not come across multiple nested dictionaries.
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@@ -819,7 +819,7 @@ class QueryConfig { | |||
} | |||
|
|||
bool validateOutputFromOperators() const { | |||
return get<bool>(kValidateOutputFromOperators, false); | |||
return get<bool>(kValidateOutputFromOperators, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want set this back to false before land
@@ -174,6 +174,8 @@ TEST_P(PeeledEncodingBasicTests, allCommonDictionaryLayers) { | |||
std::vector<VectorPtr> peeledVectors; | |||
auto peeledEncoding = PeeledEncoding::peel( | |||
{input1, input2, input3}, rows, localDecodedVector, true, peeledVectors); | |||
ASSERT_TRUE(peeledEncoding == nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to remove before land.
@@ -191,7 +193,7 @@ TEST_P(PeeledEncodingBasicTests, someCommonDictionaryLayers) { | |||
// Dict1(Dict2(Dict3(Flat2))) | |||
// Peeled Vectors: Dict2(Flat), Const1, Dict2(Dict3(Flat2)) | |||
// Peel: Dict1 | |||
|
|||
GTEST_SKIP(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why skip?
@@ -368,6 +368,7 @@ TEST_P(PeeledEncodingBasicTests, dictionaryLayersHavingNulls) { | |||
// Peeled Vectors: DictWithNulls(Flat1), Const1, | |||
// DictWithNulls(Dict3(Flat2)) | |||
// Peel: DictNoNulls | |||
GTEST_SKIP(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@@ -419,6 +420,7 @@ TEST_P(PeeledEncodingBasicTests, constantResize) { | |||
} | |||
|
|||
TEST_P(PeeledEncodingBasicTests, intermidiateLazyLayer) { | |||
GTEST_SKIP(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
// wrap nulls must be applied to all columns. | ||
Buffer* nulls; | ||
|
||
// Set of distinct wrappers with its transpose result as second. These are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the comment?
if (wrapState.transposeResults.empty()) { | ||
wrapState.nulls = wrapNulls.get(); | ||
} else { | ||
VELOX_CHECK( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VELOX_CHECK_EQ
void projectChildren( | ||
std::vector<VectorPtr>& projectedChildren, | ||
const RowVectorPtr& src, | ||
const std::vector<IdentityProjection>& projections, | ||
int32_t size, | ||
const BufferPtr& mapping); | ||
const BufferPtr& mapping, | ||
WrapState* state = nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WrapState must be provided so consider to change to & the same as WrapOne
@@ -154,7 +185,8 @@ void projectChildren( | |||
const std::vector<VectorPtr>& src, | |||
const std::vector<IdentityProjection>& projections, | |||
int32_t size, | |||
const BufferPtr& mapping); | |||
const BufferPtr& mapping, | |||
WrapState* state = nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
if (auto projectNode = | ||
std::dynamic_pointer_cast<const core::ProjectNode>(next)) { | ||
std::shared_ptr<const core::ProjectNode> projectNode; | ||
if (FLAGS_merge_project && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add gflag here?