-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[iceberg] [native] Add support for V1 tables #21584
[iceberg] [native] Add support for V1 tables #21584
Conversation
presto-native-execution/presto_cpp/presto_protocol/Connectors.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/types/PrestoToVeloxQueryPlan.cpp
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @imjalpreet.
Would you be able to split out the protocol changes from the plan conversion changes in this PR ? That would be easier to review and maintain.
@aditi-pandit can you let me know how you want to split the changes? Did you mean in a separate PR? In this PR I had kept separate commits for the protocol and Presto to Velox plan conversion changes. |
@imjalpreet : Yeah, would be great to have a single PR with just the changes you have in commit Add Iceberg Connector to parse Java Iceberg Plan Fragment. After that checked-in its easier to look at how the protocol pieces are mapped into Prestissimo. |
Please separate the Prestissimo bits into their own PR. |
62163c3
to
70146b3
Compare
@@ -852,6 +853,11 @@ std::vector<std::string> PrestoServer::registerConnectors( | |||
PRESTO_STARTUP_LOG(INFO) << "Registering catalog " << catalogName | |||
<< " using connector " << connectorName; | |||
|
|||
// TODO: Temporary Change to use HiveDataSource for Iceberg Connector | |||
if (connectorName == "iceberg") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this really happen? Is the connectorName not from presto-native-execution/etc/catalog? There is no iceberg.properties there and in hive.properties there is only connector.name=hive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we would need to add the iceberg catalog properties file in presto-native-execution/etc/catalog
if someone wants to read iceberg tables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to add a default file, I can push one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imjalpreet yes please add one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW I think we should directly identify the connector name as hive in iceberg.properties. This catalog property file is for Presto Native, which doesn't have Iceberg connector, but just Hive connector.
connector.name=hive
presto-native-execution/presto_cpp/presto_protocol/Connectors.cpp
Outdated
Show resolved
Hide resolved
@@ -142,6 +142,18 @@ std::shared_ptr<connector::ColumnHandle> toColumnHandle( | |||
toRequiredSubfields(hiveColumn->requiredSubfields)); | |||
} | |||
|
|||
if (auto icebergColumn = | |||
dynamic_cast<const protocol::IcebergColumnHandle*>(column)) { | |||
// TODO(imjalpreet): Modify 'hiveType' argument of the 'HiveColumnHandle' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What needs to be done on hiveType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of the hive connector, HiveColumnHandle(Presto) also has a parameter of type HiveType
which is being used while creating the HiveColumnHandle velox object. We don't have the same in IcebergColumnHandle yet.
The prestissimo change was added as part of this commit in the case of hive connector: 6570b00
f0ff681
to
2e229ac
Compare
5f71cfe
to
51050fc
Compare
@imjalpreet is this ready for review? |
Can you update the description? |
@majetideepak Yes, this is ready for review. I have updated the description as well. |
@@ -852,6 +853,11 @@ std::vector<std::string> PrestoServer::registerConnectors( | |||
PRESTO_STARTUP_LOG(INFO) << "Registering catalog " << catalogName | |||
<< " using connector " << connectorName; | |||
|
|||
// TODO: Temporary Change to use HiveDataSource for Iceberg Connector | |||
if (connectorName == "iceberg") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW I think we should directly identify the connector name as hive in iceberg.properties. This catalog property file is for Presto Native, which doesn't have Iceberg connector, but just Hive connector.
connector.name=hive
presto-native-execution/presto_cpp/main/types/PrestoToVeloxQueryPlan.cpp
Outdated
Show resolved
Hide resolved
} | ||
std::unordered_map<std::string, std::string> customSplitInfo; | ||
|
||
std::shared_ptr<std::string> extraFileInfo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
|
||
std::shared_ptr<std::string> extraFileInfo; | ||
|
||
std::optional<int> tableBucketNumber; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
partitionKeys, | ||
tableBucketNumber, | ||
customSplitInfo, | ||
extraFileInfo), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just change to nullptr
icebergSplit->start, | ||
icebergSplit->length, | ||
partitionKeys, | ||
tableBucketNumber, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nullopt
Use Presto IcebergColumnHandle to create Velox HiveColumnHandle Use Presto IcebergTableLayoutHandle/IcebergTableHandle to create Velox HiveTableHandle
51050fc
to
f9dfb73
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdcmeehan @majetideepak @aditi-pandit @mbasmanova Do you want to review this PR again?
@imjalpreet, @yingsu00 can you share some details of the tests that were run? What scale factor was used? How many nodes? Was there any comparison made with the Hive Connector? |
I believe only Copy-on-Write tables are supported correct? Can you clarify the scope? |
presto-native-execution/presto_cpp/main/types/PrestoToVeloxQueryPlan.cpp
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/types/PrestoToVeloxSplit.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/types/PrestoToVeloxSplit.cpp
Outdated
Show resolved
Hide resolved
Co-authored-by: Jalpreet Singh Nanda (:imjalpreet) <[email protected]>
f9dfb73
to
1ec8e47
Compare
@majetideepak Currently, we have only done correctness verifications with smaller scale factors on local machines. We are in the process of testing it with sf1k and sf10k and working on generating the datasets. If you would like to see performance comparison for small scale factor, I can share that or we can wait for the tests with sf1k and sf10k.
Yes, currently this PR would enable Prestissimo to be able to read iceberg v1 tables. Since v2 is not fully supported yet, merge on read tables won't be supported with this PR since they are only available with iceberg v2 tables. |
@majetideepak Reetika has a branch with the TPCH/DS tiny e2e tests here https://github.com/agrawalreetika/prestodb/commits/icebergPrestissimo-tests-tpch-v1/ . Since it depends on this PR, we will submit a PR after this one is merged. |
@@ -0,0 +1,4 @@ | |||
# The Presto "iceberg" catalog is handled by the hive connector in Presto native execution. | |||
connector.name=hive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can add a mapping from iceberg
name to hive
on the Velox side. Similar to how hive-hadoop2 is mapped. Not a blocker for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a PR here facebookincubator/velox#8765
1ec8e47
to
7aa5ab1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @imjalpreet
Just in case: if this doesn't fail hard on V2 tables, can we add a small followup to do that until V2 support is finished? |
@tdcmeehan Yes, sure. The follow-up PR for V2 support should also be ready soon. But I will raise the above fail-fast change in the meantime. |
Description
This PR includes the changes required to convert the Presto Iceberg plan and splits into the Velox plan and splits.
Motivation and Context
facebookincubator/velox#5977
Impact
With these changes, we will be able to read iceberg v1 tables from Prestissimo
Test Plan
Successfully ran all TPC-H queries with iceberg tables (both partitioned and non-partitioned)
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.