Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnest array of rows based on unnestArrayOfRows #7477

Closed
wants to merge 1 commit into from

Conversation

aditi-pandit
Copy link
Collaborator

@aditi-pandit aditi-pandit commented Nov 8, 2023

Velox unnests array of row type into a single column of the
element row type. Presto also supports unnesting such arrays
into multiple columns one for each child type of element row
type. Adds unnestArrayOfRows to add such unnesting support
in Unnest PlanNode.

Needed for prestodb/presto#20643

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 8, 2023
@aditi-pandit aditi-pandit marked this pull request as draft November 8, 2023 23:50
Copy link

netlify bot commented Nov 8, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 72d07d4
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6581ed91418d6a000895bd26

@aditi-pandit aditi-pandit changed the title Fix unnest for array of rows Unnest array of rows based on legacyUnnest Nov 14, 2023
@aditi-pandit aditi-pandit force-pushed the unnest_rows branch 4 times, most recently from 6405b9e to 6221695 Compare November 15, 2023 06:07
@aditi-pandit aditi-pandit marked this pull request as ready for review November 15, 2023 06:07
velox/exec/Unnest.cpp Outdated Show resolved Hide resolved
@aditi-pandit
Copy link
Collaborator Author

@mbasmanova : Updated the code. PTAL.

@aditi-pandit aditi-pandit force-pushed the unnest_rows branch 2 times, most recently from e8fad87 to 22a676e Compare December 1, 2023 20:11
velox/core/PlanNode.h Outdated Show resolved Hide resolved
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aditi-pandit Aditi, thank you for working on this. Overall looks good % a comment about parameter name, another comment about output of UnnestNode::toString() and some comments about tests.

velox/core/PlanNode.cpp Outdated Show resolved Hide resolved
velox/core/PlanNode.cpp Outdated Show resolved Hide resolved
velox/core/PlanNode.cpp Outdated Show resolved Hide resolved
velox/core/PlanNode.cpp Outdated Show resolved Hide resolved
velox/core/PlanNode.cpp Show resolved Hide resolved
velox/exec/Unnest.cpp Show resolved Hide resolved
velox/exec/tests/utils/PlanBuilder.h Outdated Show resolved Hide resolved
velox/exec/tests/utils/PlanBuilder.cpp Outdated Show resolved Hide resolved
velox/exec/tests/UnnestTest.cpp Outdated Show resolved Hide resolved
velox/exec/tests/UnnestTest.cpp Show resolved Hide resolved
@aditi-pandit aditi-pandit changed the title Unnest array of rows based on legacyUnnest Unnest array of rows based on unnestArrayOfRows Dec 8, 2023
@aditi-pandit aditi-pandit force-pushed the unnest_rows branch 3 times, most recently from 3aa76f7 to 7d91e78 Compare December 15, 2023 22:35
@aditi-pandit
Copy link
Collaborator Author

@mbasmanova : Have addressed all the comments. PTAL. #8077 is separated for refactoring the long method in Unnest logic pointed in review.

@aditi-pandit aditi-pandit force-pushed the unnest_rows branch 2 times, most recently from 435ff31 to 4efbdc6 Compare December 16, 2023 07:09
Velox unnests array of row type into a single column of the
element row type. Presto also supports unnesting such arrays
into multiple columns one for each child type of element row
type. Adds unnestArrayOfRows to add such unnesting support
in Unnest PlanNode.
@mbasmanova
Copy link
Contributor

@aditi-pandit CI is red. Would you try rebasing?

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aditi-pandit Aditi, this PR is shaping up nicely. A few remaining comments.

/// @param ordinalityName Optional name for the ordinality columns. If not
/// present, ordinality column is not produced.
/// @param unnestArrayOfRows Expands array of rows to multiple columns, one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: If true, expands .... Otherwise, ...

unnestVariables,
unnestNames,
ordinalityName,
true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if default should be false. Doesn't current logic match unnestArrayOfRows = false?

@@ -665,9 +667,11 @@ number starting with 1.
* - unnestVariables
- Input columns of type array or map to expand.
* - unnestNames
- Names to use for expanded columns. One name per array column. Two names per map column.
- Names to use for expanded columns. One name per array column. Two names per map column. If array of rows (and not legacyUnnest), then one column per child type of the row.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

legacyUnnest

Should we update this?


ASSERT_EQ("-- Unnest\n", plan->toString());
ASSERT_EQ(
"-- Unnest[a0, unnest arrays of rows] -> c1:INTEGER, a0_e:SMALLINT\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this example use array(row) and not array(smallint)?

@@ -493,4 +494,23 @@ TEST_F(QueryAssertionsTest, intervalDayTime) {
assertQuery(plan, "SELECT * FROM tmp");
}

TEST_F(QueryAssertionsTest, equalPlans) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this test for?


// Tests with an array with dictionary encoded elements.
VectorFuzzer::Options opts;
opts.containerVariableLength = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why false? Usually, these kinds of tests have a loop with 100 or so iterations and allow Fuzzer to generate whatever encoding it chooses using fuzzInputRow.


// Test with array of rows (one of which is a constant). Each array is
// encoded.
vector_size_t size = array1.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use fuzzer.

.planNode();
};

auto noUnnestPlan = [&](const VectorPtr& child1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the same plans but first use data generated by Fuzzer, then use same data but flattened. The results should be the same.

@@ -1426,8 +1426,10 @@ PlanBuilder& PlanBuilder::nestedLoopJoin(
PlanBuilder& PlanBuilder::unnest(
const std::vector<std::string>& replicateColumns,
const std::vector<std::string>& unnestColumns,
const std::optional<std::string>& ordinalColumn) {
const std::optional<std::string>& ordinalColumn,
const bool unnestArrayOfRows) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove const

velox/exec/tests/utils/QueryAssertions.h Show resolved Hide resolved
Copy link

stale bot commented Mar 20, 2024

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions!

@stale stale bot added the stale label Mar 20, 2024
@stale stale bot closed this Apr 11, 2024
@aditi-pandit aditi-pandit reopened this Apr 26, 2024
@stale stale bot removed the stale label Apr 26, 2024
Copy link

stale bot commented Jul 26, 2024

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions!

@stale stale bot added the stale label Jul 26, 2024
@stale stale bot closed this Aug 9, 2024
@aditi-pandit aditi-pandit reopened this Aug 10, 2024
@stale stale bot removed the stale label Aug 10, 2024
Copy link

stale bot commented Nov 9, 2024

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions!

@stale stale bot added the stale label Nov 9, 2024
@stale stale bot closed this Nov 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants