-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support reading Iceberg positional delete files #7847
Conversation
✅ Deploy Preview for meta-velox canceled.
|
e65a2e1
to
12b6e7a
Compare
124d5de
to
75c4d21
Compare
9f67b80
to
e1111b9
Compare
100dec9
to
276994f
Compare
276994f
to
6cc07d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think overall it's good. Some of the utility is not really needed in delta reader (makeScanSpec
, testFilters
), but it does not hurt to make standalone functions of them. Can we extract the creation of HiveConnectorUtil.h
and refactor of HiveDataSource
into a different PR to be merged before the delta readers?
auto deleteTableHandle = std::make_shared<HiveTableHandle>( | ||
connectorId, deleteFileName, false, createFilters(), nullptr); | ||
|
||
auto scanSpec = makeScanSpec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need makeScanSpec
, just make createFilters()
return the scan spec with filters set as I mentioned here: #7362 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yuhta I tried what you suggested, but it throws exception because of the channel out of bound for the _pos column. Note that we only project out _pos column, but when we add it through addAllChildFields(), it assigns the channel 1 for _pos column. Adding more lines to fix this may not be as neat as just calling makeScanSpec so I converted it back to makeScanSpec().
#include "velox/connectors/hive/FileHandle.h" | ||
|
||
namespace facebook::velox { | ||
class BaseVector; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These forward declaration are overkill, since this is not a public interface, we can just include these dependencies in the header
Sure. Created #8104 |
6cc07d5
to
aad4c9b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yuhta Addressed all comments, can you please review again? Many thanks!
auto deleteTableHandle = std::make_shared<HiveTableHandle>( | ||
connectorId, deleteFileName, false, createFilters(), nullptr); | ||
|
||
auto scanSpec = makeScanSpec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yuhta I tried what you suggested, but it throws exception because of the channel out of bound for the _pos column. Note that we only project out _pos column, but when we add it through addAllChildFields(), it assigns the channel 1 for _pos column. Adding more lines to fix this may not be as neat as just calling makeScanSpec so I converted it back to makeScanSpec().
5f49d04
to
633da9a
Compare
d0732ab
to
9979b00
Compare
9979b00
to
cbb8efe
Compare
The test failure is because of #8708 |
@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
58eee69
to
afa6151
Compare
@Yuhta All tests passed, will you please review again? Thanks! |
@yingsu00 Some test failing under ASAN:
|
4d5201e
to
f4331a1
Compare
In this commit we introduces IcebergSplitReader which supports reading Iceberg splits with positional delete files. Add subfield filter for the delete file path column
f4331a1
to
ac818fd
Compare
@Yuhta Thanks for the logs. I have fixed the problem, can you please try importing it again? Thanks! |
@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
…end (#4779) Velox add iceberg mor table read support in facebookincubator/velox#7847. This PR supports read iceberg mor table for Velox backend.
…end (apache#4779) Velox add iceberg mor table read support in facebookincubator/velox#7847. This PR supports read iceberg mor table for Velox backend.
…end (apache#4779) Velox add iceberg mor table read support in facebookincubator/velox#7847. This PR supports read iceberg mor table for Velox backend.
…end (apache#4779) Velox add iceberg mor table read support in facebookincubator/velox#7847. This PR supports read iceberg mor table for Velox backend.
In this PR we introduces IcebergSplitReader which supports reading
Iceberg splits with positional delete files