-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Core, Spark 4.0: Remove usage of deprecated avro/DataReader class #14387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core, Spark 4.0: Remove usage of deprecated avro/DataReader class #14387
Conversation
|
|
||
| /** | ||
| * @deprecated will be removed in 2.0.0; use {@link PlannedDataReader} instead. | ||
| * @deprecated will be removed in 1.12.0; use {@link PlannedDataReader} instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to ask you to check this with the community first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deprecating this earlier makes sense to me
|
FYI, If there is a community support for this, I can do the same changes in older Spark versions within the same PR. |
| @Test | ||
| public void testPositionDeletes() throws Exception { | ||
| @ParameterizedTest | ||
| @ValueSource(strings = {"avro", "parquet", "orc"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why we're changing the test here when switching to PlannedDataReader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is that the code I changed had no test coverage at all: we only tested rewriting Parquet pos deletes, but the change was on the Avro path. I can split that into a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm ok I see, thanks for the clarification. We should probably do the parameterization at the class level, since we'll need to run this test class also with different format versions. Btw there's also #14351 that is working on some aspects of this, so I might be good to sync up the efforts on this test class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing me to that PR! I left my thoughts there. I can wait for that other one to be merged, and add the file format dimension after, or in case we do it in parallel someone will have to solve git conflicts, and maybe there is overlapping work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nastra I gave this a second thought: file format doesn't matter much for most of the tests in this suite, because they rewrite metadata files (only Avro atm) and some test also rewrite positional delete files. I observed that this suite runs pretty long, over 5 mins without parameterization, but once I added the 3 file formats as params, the runtime grew to almost 15 mins.
So I'd be careful how much test suite level dimension we introduce here as it can significantly increase runtime and probably doesn't add much to the test coverage. I'd prefer to add the file format param only to the tests that are relevant for pos deletes. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my main issue is that using @ParameterizedTest for one test and using @TestTemplate (we eventually want to run this test with format version 2 and 3, which is being added by #14351) for everything else is not compatible. Alternatively maybe do the parameterization differently by having a testPositionDeletes(String format) without a @Test annotation and then have @Test testPositionDeletesWithAvro() / @Test testPositionDeletesWithParquet() which both call testPositionDeletes(String format)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback!
I found 4 tests that use positional deletes. I changed them to exercise all 3 formats. Let me know if this is what you meant!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just having testPositionDeletes(String format) and run it across formats is fine IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I reduced the scope of the test change to a single test.
5f5fcc3 to
6e6ef08
Compare
|
Thanks for taking a look @nastra , @huaxingao , @pvary ! |
6e6ef08 to
ed2be06
Compare
|
Thanks for the reviews @nastra @huaxingao ! |
No description provided.