-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring HiveConnectorTestBase::writeToFile() to use dynamic cast for writer of DWRF or Parquet file format #10150
Conversation
✅ Deploy Preview for meta-velox canceled.
|
12768e4
to
5e5ac4f
Compare
c514e37
to
51e36d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, a few comments.
2ea8297
to
fbdec9d
Compare
@minhancao please squash your commits and take a look at the failures. |
6ac9dd9
to
6338048
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good!
ce31b16
to
789723c
Compare
@yingsu00 I have rebased with main and fixed merge conflicts, please review when you can, thank you! I've noticed that dwrf configs are passed to HiveConnectorTestBase::writeToFile() quite often in tests so I decided to create my own function for agnostic use, HiveConnectorTestBase::writeToFileCommon(). I suggest maybe also changing writeToFile() to be named writeToFileDwrf() since it seems to only pertain to dwrf file format based from its usage in tests. What do you think? I've also noticed that in velox/dwio/common/Options.h, there is the struct WriterOptions and it has properties that both pertain to dwrf and parquet. It looks like the current approach here is to populate the writer options accordingly based on the WriterFactory's file format. If we go with the current approach, we would be further adding on much more properties to the common struct WriterOptions when more file formats are supported later on... is that ok? Or maybe I can try another route and try having the common WriterOptions as a pointer parameter for HiveConnectorTestBase::writeToFileCommon() and then dynamically casting it to its respective file format? velox_hive_iceberg_test results:
velox_hive_connector_test results:
|
d0ef807
to
20e1d39
Compare
@minhancao Can you please rebase this PR after #10470 and #10505 are merged? Thanks! |
20e1d39
to
78704f5
Compare
@yingsu00 I have rebased with main to bring in your PR and Pedro's PR, please review when you can, thanks! I had one error while running the HiveIcebergTest:
Do you maybe have any insight for this error? Ying's PR: Pedro's PRs: |
@yingsu00 @czentgr I noticed from Pedro's recent PR #10470, he moved the parquet specific option properties in dwio/common/Options.h to only be in the parquet::WriterOptions in dwio/parquet/writer/Writer.h. In parquet/writer/Writer.cpp, he also rewrote the createWriter() function to be:
The current route that my PR is taking is providing all options of DWRF and Parquet into dwio/common/Options.h and populating it accordingly based on the file format for its respective writers. I have edited the code above to go to my route when the dynamic cast fails.
Is this ok? Or is there a better way to approach this? It seems to me that we are in conflict now of whether we should take the route of providing all file format options into dwio/common/Options.h and populating it accordingly OR providing the specific file format WriterOptions at the beginning in the test or wherever it's called and dynamically casting it when it is received to its respective file format type like in Pedro's PR. |
Hi @pedroerp! I saw from your recent PR #10470 you moved the Parquet specific options from dwio/common/Options.h to be in the specific Parquet writer options. Do you think the DWRF options that are in dwio/common/Options.h should be moved to the specific DWRF writer options as well and refactor accordingly? This PR was meant to refactor HiveConnectorTestBase::writeToFile() to be able to be agnostic and use DWRF or Parquet writer depending on the file format that is passed in. |
Hi @minhancao, yes, for sure. I have a draft PR for that, but it turns out to be a bit more refactoring that I expected, but I'm planning to get back at it next week. |
@pedroerp Ok awesome! Can you please link me the PR whenever it's ready? |
65a840a
to
bad4b33
Compare
bad4b33
to
7d20189
Compare
Hi @pedroerp! Just wanted to check in if you were working on a PR for the DWRF options that are in dwio/common/Options.h to be moved to the specific DWRF writer options just like how you did with Parquet options? If not, I can take up this work. |
@minhancao I haven't had time to look into that yet, but I'm happy to help with code review if this is something you would like to work on. @Yuhta has a lot more context on that code though. The one tricky part is to make sure we maintain the same behavior and default values; I remember seeing that there a couple of different places where these are defined for DWRF today. |
Looks like DWRF and Parquet writer options were both refactored to be using dynamic_cast now thanks to this PR 2 weeks ago: |
7d20189
to
e3ce941
Compare
110593a
to
621beb6
Compare
Refactored HiveConnectorTestBase::writeToFile(), HiveIcebergTest, and TableScanTest based on #10915. |
621beb6
to
ef0c390
Compare
7429386
to
3e06627
Compare
… file formats and refactoring tests accordingly
3e06627
to
d1d9dc6
Compare
CC: @svm1 |
This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions! |
Refactoring HiveConnectorTestBase::writeToFile() to use dynamic cast for writer of DWRF or Parquet file format. Also refactoring HiveIcebergTest and TableScanTest since they use HiveConnectorTestBase class.
Related to work in:
#9973
#10915
Resolves:
#9171