Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Hive 2 #10996

Closed
wants to merge 1 commit into from
Closed

Remove Hive 2 #10996

wants to merge 1 commit into from

Conversation

manuzhang
Copy link
Collaborator

@manuzhang manuzhang commented Aug 23, 2024

This PR removes Hive 2 dependency

  • Remove Hive dependency from iceberg-mr and merge all Hive related codes into iceberg-hive3
  • Upgrade Hive 2 dependency to Hive 3 in other modules
  • Set default Hive version to 3

@manuzhang manuzhang force-pushed the drop-hive2 branch 6 times, most recently from b87a7e6 to f55a620 Compare August 26, 2024 06:45
@manuzhang manuzhang changed the title Remove Hive 2 dependency Remove Hive 2 and Hadoop 2 Aug 26, 2024
@manuzhang manuzhang closed this Aug 26, 2024
@manuzhang manuzhang reopened this Aug 26, 2024
@manuzhang manuzhang closed this Aug 26, 2024
@manuzhang manuzhang reopened this Aug 26, 2024
@github-actions github-actions bot removed the ALIYUN label Aug 27, 2024
@manuzhang manuzhang changed the title Remove Hive 2 and Hadoop 2 Remove Hive 2 Aug 27, 2024
@pvary
Copy link
Contributor

pvary commented Nov 15, 2024

I expect that there will be users trying to use HiveCatalog with old Hive versions.
Do we want to (could we) throw a meaningful exception when this happens?

Maybe we can throw an exception in the HiveVersion.calculalte()? I am not sure about this, as this will cause an error to be thrown when Hive is on the classpath, even if it is not used.

Is HiveVersion used anywhere after this change?

private static final DynConstructors.Ctor<AbstractMapredIcebergRecordReader>
HIVE_VECTORIZED_RECORDREADER_CTOR;
private static DynConstructors.Ctor<AbstractMapredIcebergRecordReader>
hiveVectorizedRecordreaderCtor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why this is non-final now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HiveIcebergVectorizedRecordReader is in iceberg-hive3 module and can not be loaded in iceberg-mr tests. Hence, we need to catch the exception and reassign it to null. Same for HiveVectorizedReader.

@@ -213,11 +212,11 @@ private static final class IcebergRecordReader<T> extends RecordReader<Void, T>

private static final String HIVE_VECTORIZED_READER_CLASS =
"org.apache.iceberg.mr.hive.vector.HiveVectorizedReader";
private static final DynMethods.StaticMethod HIVE_VECTORIZED_READER_BUILDER;
private static DynMethods.StaticMethod hiveVectorizedReaderBuilder;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above. why is this non-final now?

@manuzhang
Copy link
Collaborator Author

I expect that there will be users trying to use HiveCatalog with old Hive versions.

@pvary We don't expect them to use iceberg 1.8+ with old Hive versions, right?

@manuzhang
Copy link
Collaborator Author

6 tests in iceberg-mr failed due to HIVE-21584.

TestHiveIcebergWithHiveAutogatherEnable > initializationError FAILED
    java.lang.RuntimeException: Error applying authorization policy on hive configuration: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader and java.net.URLClassLoader are in module java.base of loader 'bootstrap')
        at org.apache.hive.service.cli.CLIService.init(CLIService.java:118)
        at org.apache.hive.service.CompositeService.init(CompositeService.java:59)
        at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:229)
        at org.apache.iceberg.mr.hive.TestHiveShell.start(TestHiveShell.java:96)
        at org.apache.iceberg.mr.hive.HiveIcebergStorageHandlerTestUtils.shell(HiveIcebergStorageHandlerTestUtils.java:76)
        at org.apache.iceberg.mr.hive.TestHiveIcebergWithHiveAutogatherEnable.beforeClass(TestHiveIcebergWithHiveAutogatherEnable.java:74)

        Caused by:
        java.lang.ClassCastException: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader and java.net.URLClassLoader are in module java.base of loader 'bootstrap')
            at org.apache.hadoop.hive.ql.session.SessionState.<init>(SessionState.java:413)
            at org.apache.hadoop.hive.ql.session.SessionState.<init>(SessionState.java:389)
            at org.apache.hive.service.cli.CLIService.applyAuthorizationConfigPolicy(CLIService.java:128)
            at org.apache.hive.service.cli.CLIService.init(CLIService.java:115)
            ... 5 more

It's been back-ported to hive 3.1, but there's no new release yet. @pvary @nastra thoughts?

@pvary
Copy link
Contributor

pvary commented Nov 15, 2024

I expect that there will be users trying to use HiveCatalog with old Hive versions.

@pvary We don't expect them to use iceberg 1.8+ with old Hive versions, right?

I meant, that they will try to do it, and it might be nice to throw an exception with a meaningful message...
But I take this back... they just need to learn

@pvary
Copy link
Contributor

pvary commented Nov 15, 2024

It's been back-ported to hive 3.1, but there's no new release yet. @pvary @nastra thoughts?

What about using Hive4 instead?

@pvary
Copy link
Contributor

pvary commented Nov 16, 2024

It's been back-ported to hive 3.1, but there's no new release yet. @pvary @nastra thoughts?

Basically the missing release means that there is no Hive 3 with Java 11 (contrary to our previous expectations).

If we start using Hive 4, we have to exclude the embedded Iceberg from the classpath.

@manuzhang
Copy link
Collaborator Author

@pvary @nastra @Fokko
Given Hive 3.1 is already broken on JDK 11+ before this PR, how about skipping these failed tests not to block removing Hive 2? I think upgrading to Hive 4 is a bigger change and effort, and needs more time and discussion.

@manuzhang manuzhang force-pushed the drop-hive2 branch 2 times, most recently from ce3f136 to 27befad Compare November 18, 2024 03:56
@pvary
Copy link
Contributor

pvary commented Nov 18, 2024

@pvary @nastra @Fokko Given Hive 3.1 is already broken on JDK 11+ before this PR, how about skipping these failed tests not to block removing Hive 2? I think upgrading to Hive 4 is a bigger change and effort, and needs more time and discussion.

Which tests are failing? I'm afraid that most of the Hive execution based tests use HiveShell, which is not able to start. This would mean that we effectively remove Hive execution testing.
I would much prefer to move to Hive4, and keep testing.

BTW I see Spark test failures. Old Spark used embedded Hive 1.2. That was the reason for some of the Hive version check in the HiveTableOpereation code. You might need to check that too.

Thanks for working on this!
Peter

@manuzhang manuzhang force-pushed the drop-hive2 branch 3 times, most recently from dd01844 to cbdf523 Compare November 18, 2024 09:06
@manuzhang
Copy link
Collaborator Author

manuzhang commented Nov 18, 2024

@pvary The failed tests list

  • TestHiveIcebergStorageHandlerLocalScan
  • TestHiveIcebergStorageHandlerNoScan
  • TestHiveIcebergStorageHandlerTimezone
  • TestHiveIcebergStorageHandlerWithEngine
  • TestHiveIcebergStorageHandlerWithMultipleCatalogs
  • TestHiveIcebergWithHiveAutogatherEnable

I've restored checking of hive version in iceberg-hive-metastore module for spark tests to pass. HiveVersion is now only used in that module.

This would mean that we effectively remove Hive execution testing.

Do we have these testing for Hive 3 on main branch now?

@pvary
Copy link
Contributor

pvary commented Nov 18, 2024

The failed tests list [..]
Do we have these testing for Hive 3 on main branch now?

I think these tests should be running fine with Java 8. The issue is that with Java 11 we can not use this Hive 3 release.

@manuzhang
Copy link
Collaborator Author

@pvary I mean these tests already fail for hive 3 after dropping JDK 8 support. It's not caught before due to Hive 3 tests not running. Please check #11584, where that is fixed and those tests fail for Hive 3.

@pvary
Copy link
Contributor

pvary commented Nov 19, 2024

Let me consult with the Hive folks and come back soon with a proposal.
Thanks,
Peter

@pvary
Copy link
Contributor

pvary commented Nov 19, 2024

I have learned that there are no Hive 3 releases are planned. So these tests will not work with Hive3.

I would like to see the Hive connector upgraded to Hive 4 which supports Java 11, and based on the informations I have learned from the Hive folks, Hive 4.1.0 will support Java 17 too.

@gaborkaszab mentioned that he could make some time to check if the upgrade is possible - if that is something that the community thinks would be useful. I would suggest to discuss this with a wide audience on the dev list before starting to work on it.

@manuzhang
Copy link
Collaborator Author

@pvary @nastra @Fokko @gaborkaszab I've sent out a discussion email. Please share your thoughts there.

@manuzhang
Copy link
Collaborator Author

We have a consensus on the dev list to drop hive-runtime and upgrade to Hive 4. I've submitted #11750 and will rework this PR once that's in.

Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jan 12, 2025
@manuzhang manuzhang closed this Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants