Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace tableScan API with tableScanBuilder #7769

Closed

Conversation

majetideepak
Copy link
Collaborator

@majetideepak majetideepak commented Nov 28, 2023

Remove the tableScan API that is less popular to keep the planBuilder API clean.

Copy link

netlify bot commented Nov 28, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 8451d92
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6571c52d12c18900084f6a15

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 28, 2023
@majetideepak
Copy link
Collaborator Author

@mbasmanova any thoughts on this change? Thanks.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@majetideepak Overall looks good. The new code is longer, but maybe it is easier to read. I have some questions.

auto op = PlanBuilder(planNodeIdGenerator, pool_.get())
.startTableScan()
.outputType(scanOutputType)
.tableHandle(makeTableHandle(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be simplified to makeTableHandle(), no?

assignments)
.startTableScan()
.outputType(rowType)
.tableHandle(makeTableHandle(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this could be just

.remainingFilter("e.a is null")

.outputType(ROW({{"d", BIGINT()}}))
.tableHandle(makeTableHandle(
SubfieldFilters{},
parseExpr("e.a is null or e.b is null", rowType)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

.startTableScan()
.outputType(
outputColumn == kNoOutput ? ROW({"d"}, {BIGINT()}) : rowType)
.tableHandle(makeTableHandle(SubfieldFilters{}, remainingFilter))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

.tableScan(rowType, makeTableHandle(), assignments)
.startTableScan()
.outputType(rowType)
.tableHandle(makeTableHandle())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

.tableScan(ROW({}, {}), {"c1 <= 4000.1"}, "", dataColumns)
.startTableScan()
.outputType(ROW({}, {}))
.subfieldFilters({"c1 <= 4000.1"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it makes sense to allow specifying individual filters separately:

.subfieldFilters("a < 5")
.subfieldFilters("b is null")

@@ -1725,7 +1828,7 @@ TEST_F(TableScanTest, statsBasedSkippingNulls) {

auto assertQuery = [&](const std::string& filter) {
return TableScanTest::assertQuery(
PlanBuilder().tableScan(rowType, {filter}).planNode(),
PlanBuilder(pool_.get()).tableScan(rowType, {filter}).planNode(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to pass pool here?

@@ -2477,7 +2625,7 @@ TEST_F(TableScanTest, remainingFilterSkippedStrides) {
}
createDuckDbTable(vectors);
core::PlanNodeId tableScanNodeId;
auto plan = PlanBuilder()
auto plan = PlanBuilder(pool_.get())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to pass pool here?

@majetideepak
Copy link
Collaborator Author

@mbasmanova I addressed your comments. Can you make another pass? I removed most of the redundant makeTableHandle() calls.
The fuzzer failure seems to be unrelated.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Thank you.

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mbasmanova merged this pull request in fc0538d.

Copy link

Conbench analyzed the 1 benchmark run on commit fc0538d9.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

mbasmanova pushed a commit to mbasmanova/velox-1 that referenced this pull request Dec 8, 2023
Summary:
Remove the tableScan API that is less popular to keep the planBuilder API clean.

Pull Request resolved: facebookincubator#7769

Reviewed By: xiaoxmeng

Differential Revision: D51971797

Pulled By: mbasmanova

fbshipit-source-id: 3f63f937a9371ec176213260d369d87d49ba1c51
@majetideepak majetideepak deleted the tablescanpart2 branch January 3, 2024 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants