Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(connector): Support reading Iceberg split with equality deletes #11088

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yingsu00
Copy link
Collaborator

This PR introduces EqualityDeleteFileReader, which is used to read
Iceberg splits with equality delete files.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 25, 2024
Copy link

netlify bot commented Sep 25, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 82b5f50
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/675a64fecf33d3000861cb02

@yingsu00
Copy link
Collaborator Author

@Yuhta This PR replaces #9063 and is ready for review again. As you suggested, we now update the ScanSpec to insert/remove new columns into it instead of cloning the whole ScanSpec. Thanks a lot for reviewing!

@czentgr
Copy link
Collaborator

czentgr commented Oct 24, 2024

@yingsu00 can you please rebase again? There is a conflict with other changes.

@yingsu00 yingsu00 force-pushed the iceberg9.3 branch 2 times, most recently from 1ce2688 to 72b4cef Compare November 13, 2024 00:44
@yingsu00
Copy link
Collaborator Author

@Yuhta Just rebased, appreciate your review again. Thanks!

@yingsu00 yingsu00 force-pushed the iceberg9.3 branch 3 times, most recently from b18f5c0 to b15537f Compare November 14, 2024 22:55
@yingsu00 yingsu00 changed the title Support reading Iceberg split with equality deletes Feature(connector): Support reading Iceberg split with equality deletes Nov 14, 2024
@yingsu00 yingsu00 changed the title Feature(connector): Support reading Iceberg split with equality deletes feat(connector): Support reading Iceberg split with equality deletes Nov 14, 2024
@yingsu00 yingsu00 requested a review from rui-mo November 22, 2024 06:19
@rui-mo
Copy link
Collaborator

rui-mo commented Nov 22, 2024

cc: @liujiayi771 Would you like to take a review? Thanks.

@FelixYBW
Copy link

cc: @liujiayi771 Would you like to take a review? Thanks.

@liujiayi771 Can you take a look of the PR if possible? should we add something in Gluten side after this PR merged? It's requested by a Gluten customer.

@liujiayi771
Copy link
Contributor

liujiayi771 commented Nov 26, 2024

@FelixYBW Yes, Gluten needs to make some minor changes to accommodate this PR. However, Spark cannot produce equality delete files. We need to use Flink to generate Iceberg tables with equality delete files for testing. I will perform some test this week.

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Added some nits.

velox/connectors/hive/iceberg/CMakeLists.txt Outdated Show resolved Hide resolved
velox/connectors/hive/iceberg/EqualityDeleteFileReader.cpp Outdated Show resolved Hide resolved
velox/connectors/hive/iceberg/EqualityDeleteFileReader.cpp Outdated Show resolved Hide resolved
velox/connectors/hive/iceberg/tests/IcebergReadTest.cpp Outdated Show resolved Hide resolved
velox/dwio/common/ScanSpec.h Outdated Show resolved Hide resolved
velox/dwio/common/ScanSpec.h Outdated Show resolved Hide resolved
nullAllowed = true;
} else {
if constexpr (std::is_same_v<U, Timestamp>) {
values.emplace_back(simpleValues->valueAt(i).toMillis());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need toMicros for Spark? cc: @liujiayi771

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rui-mo It will create a BigIntRange to filter data values. I'm not quite sure why we need to convert the Timestamp type to bigint here; shouldn't we be using TimestampRange instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rui-mo @liujiayi771 this toValues() function is copied from velox/functions/prestosql/InPredicate.cpp. We could extract it to velox/common but there isn't a good folder yet. Any good ideas? cc @Yuhta

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yingsu00 I see. The InPredicate is also registered by Spark sql and might needs to be fixed through providing configurable behavior. For the one in the 'hive/iceberg/FilterUtil.cpp', I assume a configuration 'kReadTimestampUnit' in the hiveConfig might help adapt to different precision used by Presto and Spark. It's fine for me to leave a TODO here and focus on the primary implementation first. Thanks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rui-mo Is the issue here that the precision of a Timestamp in Spark is in microseconds, and converting it to milliseconds for comparison would result in a loss of precision? For example, Timestamp(1, 999000) and Timestamp(1, 998000) would be considered the same?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@Yuhta Yuhta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall looks good, raise one discussion about the bookkeeping of temporary scan spec node and filters, and the rest are just wiring cleanup.

@@ -672,6 +672,7 @@ bool applyPartitionFilter(
VELOX_FAIL(
"Bad type {} for partition value: {}", type->kind(), partitionValue);
}
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not need this

@@ -87,6 +90,8 @@ class SplitReader {

void resetSplit();

std::shared_ptr<const dwio::common::TypeWithId> baseFileSchema();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only needed inside IcebergSplitReader

@@ -215,7 +215,10 @@ std::unique_ptr<SplitReader> HiveDataSource::createSplitReader() {
ioStats_,
fileHandleFactory_,
executor_,
scanSpec_);
scanSpec_,
remainingFilterExprSet_,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not really using this inside IcebergSplitReader (instead you have your own deleteExprSet_), let's avoid sharing and passing it.

filters_.push_back(std::move(filter));
}

void updateFilter(std::unique_ptr<Filter> newFilter) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call it pushFilter and popFilter to make it more explicit that such filters are temporary

ScanSpec* getOrCreateChild(const Subfield& subfield);
ScanSpec* getOrCreateChild(const Subfield& subfield, bool isTempNode = false);

void deleteTempNodes();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all the temp nodes are top level, can we do it less intrusively by keeping a list of temp node/filter names inside IcebergSplitReader and remove them after we've done with it? Then we don't need the extra states of isTempNode_ and hasTempFilter_ here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yuhta Thanks for the suggestion. Yes in our last discussion we thought we could just add/remove top level columns, but when I looked again I found it might be necessary to still mark the nodes with these temp tags, because the Iceberg delete file may contain subfields. For example, this query

-- create table t1 (c_row(c1 integer, c2 integer, c3 integer), c_char char);
-- insert some rows

select c_char
from t1 
where c_row.c2 > 2;

The base ScanSpec would contain the c_row child, which in turn has 3 children for c1, c2 and c3. In this case, all the three subfields would be null constants, but only c2 node would have filter c_row.c2 > 2, while c1 and c3 node don't have any filters. Now, if the Iceberg equality delete file contains the predicate c_row.c1 = 1 && c_row.c2 IN {2,3}, then we need to remove these values by adding a filter c_row.c1<>1 for the c_row.c1 node, and merge the filter c_row.c2<>2 && c_row.c2<>3 with existing filter c_row.c2 > 2 for the c_row.c2 node, so it becomes 'c_row.c2 > 3. WHen this split finishes, we need to remove the filter c_row.c1<>1` from c_row.c1, and restore the filter for c_row.c2 to its original state, but not delete the whole c_row column from the ScanSpec. Therefore we need to tag the nodes in the ScanSpec to check if it's temp node and has temp filters. Do you think this makes sense?

@@ -722,6 +724,12 @@ class ExprSet {
core::ExecCtx* execCtx,
bool enableConstantFolding = true);

ExprSet(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems these 2 are no longer needed with delete expr separated from remaining filter expr

@yingsu00 yingsu00 force-pushed the iceberg9.3 branch 3 times, most recently from 1b58a55 to 82b5f50 Compare December 12, 2024 04:22
This commit introduces EqualityDeleteFileReader, which is used to read
Iceberg splits with equality delete files.

Co-authored-by: Naveen Kumar Mahadevuni <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants