Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit Dictionary to Single Level #10942

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

oerling
Copy link
Contributor

@oerling oerling commented Sep 6, 2024

  • Modifies wrapInDictionary to flatten indices of a wrapped dictionary iwth the wrapping indices.
  • Makes lazy loading of a dictionary encoded column to combine the indices with a dictionary wrapper if loading with lazy wrapped in a dictionary.
  • Adds functions to transpose dictionaries with and without nulls.
  • Changes NestedLoopJoin and MergeJoin so that they wrap their input only after the wrapping indices are known. Previously these would wrap first and only then fill in the indices.
  • Checks that we do not come across multiple nested dictionaries.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 6, 2024
Copy link

netlify bot commented Sep 6, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit d121f54
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/676362d50e32ee0008cca9b1

@facebook-github-bot
Copy link
Contributor

@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@oerling oerling force-pushed the sdict2-pr branch 2 times, most recently from ddec93a to baae829 Compare November 6, 2024 23:00
@facebook-github-bot
Copy link
Contributor

@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

CMakeLists.txt Outdated
@@ -508,8 +508,7 @@ if(${VELOX_BUILD_TESTING})
set_source(c-ares)
resolve_dependency(c-ares)

set_source(gRPC)
resolve_dependency(gRPC)
# set_source(gRPC) resolve_dependency(gRPC)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀

Copy link
Contributor

@pedroerp pedroerp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address comments and make sure CI is all green.

@facebook-github-bot
Copy link
Contributor

@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oerling thanks for the update % minors. Thanks!

@@ -177,6 +179,10 @@ void DecodedVector::combineWrappers(
setBaseData(*values, rows);
return;
case VectorEncoding::Simple::DICTIONARY: {
if (!wasLazy) {
// LOG(ERROR) << "Multilevel dict ";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this log

velox/vector/BaseVector.cpp Show resolved Hide resolved
velox/vector/BaseVector.cpp Outdated Show resolved Hide resolved
@@ -1007,6 +1031,98 @@ std::string printIndices(
return out.str();
}

// static
void BaseVector::transposeIndices(
const vector_size_t* base,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/base/baseIndices/
s/size/wrapSize/
s/indices/wrapIndices/
s/result/resultIndices/

vector_size_t size,
const vector_size_t* wrapIndices,
const uint64_t* wrapNulls,
vector_size_t* result,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/result/resultIndices/

if (isLazyNotLoaded(*base)) {
// It is OK to rewrap a lazy. It is an error to wrap a lazy in multiple
// different dictionaries.
base->containsLazyAndIsWrapped_ = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any check on this if possible? containsLazyAndIsWrapped_ can never be true?

base->containsLazyAndIsWrapped_ = false;
}
auto* rawNulls = vector->rawNulls();
if (indices->refCount() > 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -154,6 +157,37 @@ VectorPtr BaseVector::wrapInDictionary(
shouldFlatten = !isLazyNotLoaded(*base) && (base->size() / 8) > size;
}

if (vector->encoding() == VectorEncoding::Simple::DICTIONARY) {
auto base = vector->valueVector();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/base/baseValue/

// different dictionaries.
base->containsLazyAndIsWrapped_ = false;
}
auto* rawNulls = vector->rawNulls();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/rawNulls/rawBaseNulls/

velox/vector/BaseVector.cpp Show resolved Hide resolved
- Modifies wrapInDictionary to flatten indices of a wrapped dictionary iwth the wrapping indices.
- Makes lazy loading of a dictionary encoded column to combine the indices with a dictionary wrapper if loading with lazy wrapped in a dictionary.
- Adds functions to transpose dictionaries with and without nulls.
- Changes  NestedLoopJoin and MergeJoin so  that they wrap their input only after the wrapping indices are known. Previously these would wrap first and only then fill in the indices.
- Checks that we do not come across multiple nested dictionaries.
@facebook-github-bot
Copy link
Contributor

@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@@ -819,7 +819,7 @@ class QueryConfig {
}

bool validateOutputFromOperators() const {
return get<bool>(kValidateOutputFromOperators, false);
return get<bool>(kValidateOutputFromOperators, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want set this back to false before land

@@ -174,6 +174,8 @@ TEST_P(PeeledEncodingBasicTests, allCommonDictionaryLayers) {
std::vector<VectorPtr> peeledVectors;
auto peeledEncoding = PeeledEncoding::peel(
{input1, input2, input3}, rows, localDecodedVector, true, peeledVectors);
ASSERT_TRUE(peeledEncoding == nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to remove before land.

@@ -191,7 +193,7 @@ TEST_P(PeeledEncodingBasicTests, someCommonDictionaryLayers) {
// Dict1(Dict2(Dict3(Flat2)))
// Peeled Vectors: Dict2(Flat), Const1, Dict2(Dict3(Flat2))
// Peel: Dict1

GTEST_SKIP();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why skip?

@@ -368,6 +368,7 @@ TEST_P(PeeledEncodingBasicTests, dictionaryLayersHavingNulls) {
// Peeled Vectors: DictWithNulls(Flat1), Const1,
// DictWithNulls(Dict3(Flat2))
// Peel: DictNoNulls
GTEST_SKIP();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -419,6 +420,7 @@ TEST_P(PeeledEncodingBasicTests, constantResize) {
}

TEST_P(PeeledEncodingBasicTests, intermidiateLazyLayer) {
GTEST_SKIP();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

// wrap nulls must be applied to all columns.
Buffer* nulls;

// Set of distinct wrappers with its transpose result as second. These are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the comment?

if (wrapState.transposeResults.empty()) {
wrapState.nulls = wrapNulls.get();
} else {
VELOX_CHECK(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VELOX_CHECK_EQ

void projectChildren(
std::vector<VectorPtr>& projectedChildren,
const RowVectorPtr& src,
const std::vector<IdentityProjection>& projections,
int32_t size,
const BufferPtr& mapping);
const BufferPtr& mapping,
WrapState* state = nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WrapState must be provided so consider to change to & the same as WrapOne

@@ -154,7 +185,8 @@ void projectChildren(
const std::vector<VectorPtr>& src,
const std::vector<IdentityProjection>& projections,
int32_t size,
const BufferPtr& mapping);
const BufferPtr& mapping,
WrapState* state = nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

if (auto projectNode =
std::dynamic_pointer_cast<const core::ProjectNode>(next)) {
std::shared_ptr<const core::ProjectNode> projectNode;
if (FLAGS_merge_project &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add gflag here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants