Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38051: [Java] Remove Java 8 support #43139

Merged
merged 5 commits into from
Jul 17, 2024

Conversation

laurentgo
Copy link
Collaborator

@laurentgo laurentgo commented Jul 3, 2024

What changes are included in this PR?

  • Remove support for Java 8 in Github actions and other CI/CD tasks and make Java 11 now the default version
  • Make Java 11 the minimum version required to build and run Arrow by changing the Maven project configuration:
    • Change minimum java version and source/target/release compiler properties to 11
    • Remove maven modules
    • Remove jdk11+ profiles and integrate their content into the main section
    • Let maven-compiler-plugin process module-info.java files and address several declaration issues
    • Exclude non modularized modules from javadoc aggregate tasks
    • Exclude module-info.class files from shaded jars as it is not representative of the whole content and may actually directly coming from a 3rd party dependency.
  • Update documentation

Are these changes tested?

Through CI/CD.

Are there any user-facing changes?

Yes. Java 11 is now required to run any Arrow code

This PR includes breaking changes to public APIs.

Copy link

github-actions bot commented Jul 3, 2024

⚠️ GitHub issue #38501 has been automatically assigned in GitHub to PR creator.

@laurentgo
Copy link
Collaborator Author

It should be merged only after the 17.0.0 has been released.

As for crossbow, I tried to do some local testing for it and it mostly works except maybe for some spark issues.

Copy link
Member

@danepitkin danepitkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this!

<filters>
<filter>
<excludes>
<exclude>**/module-info.class</exclude>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this save us from "module hell"? aka publishing a package that has automatic module names auto-generated from filenames?

e.g.

[WARNING] * Required filename-based automodules detected: [flatbuffers-java-24.3.25.jar, jsr305-3.0.2.jar, protobuf-java-3.25.1.jar, protobuf-java-util-3.25.1.jar]. Please don't publish this project to a public artifact repository! *

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure., maybe we can add Automatic-Module-Name attribute in the shaded jars to make the name not filename dependent.

What it helps with though is having shaded jars which exposes the wrong module information (like the jackson or gson ones which are copied over under META-INF/versions/9/module-info.class)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I did request flatbuffers to add the Automatic-Module-Name attribute: google/flatbuffers#8348

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no built-in way to merge all module-info.class files into a single one, one need to analyze what needs to be exported and how to deal also with relocation.

Specifically for this module, there are 3 different shaded jars with different set of dependencies included and different set of relocations and I'm not sure which ones are supposed to be used internally and which ones are to be used externally. Knowing the intended usage may also guide us in deciding if those artifacts should be modularized or not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still willing to get rid of the shaded artifact here...It causes confusion and complicates the build

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should open a separate issue/pull request? I tried to analyze how many shaded artifacts are generated and how many are used.

Here's the list of shaded artifacts:

  • arrow-tools-{version}-jar-with-dependencies.jar
  • flight-sql-jdbc-driver-{version}.jar
  • flight-integration-tests-{version}-jar-with-dependencies.jar
  • flight-core-{version}-shaded.jar
  • flight-core-{version}-shaded-ext.jar
  • flight-core-{version}-jar-with-dependencies.jar
  • arrow-performance-{version}-benchmarks.jar
  • arrow-vector-{version}-shade-format-flatbuffers

arrow-vector is used by other modules to not expose flatbuffers directly.

I can also see arrow-tools/flight-integration-tests used by some tests defined in dev/tasks.yml, and arrow-perfornance uber jar is common for people who wants to run jmh from commandline: those shaded jars are useful and I do not expect that we need modularization for those jars

The jdbc driver is shaded because people expect it to be self contained. It could benefit from having its own module-info.java which would only allow exposes the driver + public api, but all relocated classes would be not accessible.

Finally there are all the flight-core shaded jars and I haven't figured out if they are used or not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a separate pull request for shaded artifacts is okay. We should also get rid of the warning I pasted initially before releasing v18. Someone on the original GH issue recommended using shading to prevent us from exposing those module names. Maybe that can be a separate issue, too.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jul 3, 2024
@danepitkin
Copy link
Member

@github-actions crossbow submit -g java

Copy link

github-actions bot commented Jul 3, 2024

Revision: 851a6c5

Submitted crossbow builds: ursacomputing/crossbow @ actions-29a2a39d61

Task Status
java-jars GitHub Actions
test-conda-python-3.10-spark-v3.5.0 GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.8-spark-v3.5.0 GitHub Actions
verify-rc-source-java-linux-almalinux-8-amd64 GitHub Actions
verify-rc-source-java-linux-conda-latest-amd64 GitHub Actions
verify-rc-source-java-linux-ubuntu-20.04-amd64 GitHub Actions
verify-rc-source-java-linux-ubuntu-22.04-amd64 GitHub Actions
verify-rc-source-java-macos-amd64 GitHub Actions

@danepitkin danepitkin changed the title GH-38501: [Java] Remove Java 8 support GH-38051: [Java] Remove Java 8 support Jul 3, 2024
Copy link

github-actions bot commented Jul 3, 2024

⚠️ GitHub issue #38051 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me, thanks - just have to work through the CI

<filters>
<filter>
<excludes>
<exclude>**/module-info.class</exclude>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still willing to get rid of the shaded artifact here...It causes confusion and complicates the build

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jul 4, 2024
dev/tasks/tasks.yml Outdated Show resolved Hide resolved
@laurentgo laurentgo force-pushed the laurentgo/drop-java8 branch from 851a6c5 to 4c64ea4 Compare July 10, 2024 18:15
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jul 10, 2024
@laurentgo
Copy link
Collaborator Author

@github-actions crossbow submit -g java

Copy link

Revision: 4c64ea4

Submitted crossbow builds: ursacomputing/crossbow @ actions-1b9da10481

Task Status
java-jars GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
verify-rc-source-java-linux-almalinux-8-amd64 GitHub Actions
verify-rc-source-java-linux-conda-latest-amd64 GitHub Actions
verify-rc-source-java-linux-ubuntu-20.04-amd64 GitHub Actions
verify-rc-source-java-linux-ubuntu-22.04-amd64 GitHub Actions
verify-rc-source-java-macos-amd64 GitHub Actions

@laurentgo
Copy link
Collaborator Author

Seems like most of the tests are now passing. The remaining issues seems to be linked to #43204 and apache/arrow-java#82 (didn't find a ticket for the go test issues but I did find several pull requests having the exact same failed tests)

Copy link
Member

@danepitkin danepitkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

When this lands, we can create followup issues for 1) shaded jars and 2) filename-based automodules warning.

@laurentgo
Copy link
Collaborator Author

laurentgo commented Jul 11, 2024

LGTM!

When this lands, we can create followup issues for 1) shaded jars and 2) filename-based automodules warning.

Already filed #43217 for 1)

@danepitkin
Copy link
Member

Thank you! I'll file the second one.

Also, looks like there's a merge conflict.

@danepitkin
Copy link
Member

Issue for 2) apache/arrow-java#67

@laurentgo laurentgo force-pushed the laurentgo/drop-java8 branch 2 times, most recently from d2f2d8a to 4d96f14 Compare July 16, 2024 14:56
@danepitkin
Copy link
Member

I will merge this once Arrow v17 is officially released, which should happen in the next few days.

@jduo
Copy link
Member

jduo commented Jul 16, 2024

LGTM. I feel like we should investigate if we can refactor some of the memory manager code to avoid using reflection/unsafe and use MethodHandles instead now that we're off JDK 8 but it probably should be separate from this work.

@laurentgo
Copy link
Collaborator Author

LGTM. I feel like we should investigate if we can refactor some of the memory manager code to avoid using reflection/unsafe and use MethodHandles instead now that we're off JDK 8 but it probably should be separate from this work.

I actually did some of the work to use MethodHandles in my Java 21 change but that said using MethodHandles doesn't bring that many benefits compared to using sun.misc.Unsafe directly (it would have been a requirement if using --release 8 flag with javac because of https://bugs.openjdk.org/browse/JDK-8206937).
What may bring some improvement is the use of VarHandle to access a ByteBuffer efficiently and safely, or the new Foreign Memory API introduced in Java 22.

@laurentgo laurentgo force-pushed the laurentgo/drop-java8 branch from 4d96f14 to 93f17aa Compare July 17, 2024 02:11
Remove build support for Java 8 and make Java 11 the minimum version to
use to build Arrow in github actions and ci tasks
Channge minimum Java version to build Arrow to 11:
* Change minimum java version and source/target/release compiler
  properties to 11
* Remove maven modules
* Remove jdk11+ profiles and integrate their content into the main
  section
* Let maven-compiler-plugin process `module-info.java` files and address
  several declaration issues
* Exclude non modularized modules from javadoc aggregate tasks
* Exclude module-info.class files from shaded jars as it is not
  representative of the whole content and may actually directly coming
  from a 3rd party dependency.
@laurentgo laurentgo force-pushed the laurentgo/drop-java8 branch from 93f17aa to 5dd7042 Compare July 17, 2024 17:08
@laurentgo
Copy link
Collaborator Author

@github-actions crossbow submit -g java

Copy link

Revision: 5dd7042

Submitted crossbow builds: ursacomputing/crossbow @ actions-91691f3234

Task Status
java-jars GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
verify-rc-source-java-linux-almalinux-8-amd64 GitHub Actions
verify-rc-source-java-linux-conda-latest-amd64 GitHub Actions
verify-rc-source-java-linux-ubuntu-20.04-amd64 GitHub Actions
verify-rc-source-java-linux-ubuntu-22.04-amd64 GitHub Actions
verify-rc-source-java-macos-amd64 GitHub Actions

@danepitkin
Copy link
Member

@github-actions crossbow submit -g java

@danepitkin
Copy link
Member

danepitkin commented Jul 17, 2024

The Dev / Source Release and Merge Script is failing, I thought it might have been caused by the remaining arrow maven plugin artifacts, but perhaps not.

Edit: Probably because its building from cache -
This build will only read from the build cache, since the clean lifecycle is not part of the build invocation.

Copy link

Revision: 5dd7042

Submitted crossbow builds: ursacomputing/crossbow @ actions-89524a3647

Task Status
java-jars GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
verify-rc-source-java-linux-almalinux-8-amd64 GitHub Actions
verify-rc-source-java-linux-conda-latest-amd64 GitHub Actions
verify-rc-source-java-linux-ubuntu-20.04-amd64 GitHub Actions
verify-rc-source-java-linux-ubuntu-22.04-amd64 GitHub Actions
verify-rc-source-java-macos-amd64 GitHub Actions

@danepitkin danepitkin merged commit 4161898 into apache:main Jul 17, 2024
56 of 60 checks passed
@danepitkin danepitkin removed the awaiting change review Awaiting change review label Jul 17, 2024
@danepitkin
Copy link
Member

Thank you @laurentgo ! This PR was immensely helpful to the project.

@laurentgo laurentgo deleted the laurentgo/drop-java8 branch July 17, 2024 20:34
@laurentgo
Copy link
Collaborator Author

laurentgo commented Jul 17, 2024

This build will only read from the build cache, since the clean lifecycle is not part of the build invocation.

This is a common message I see all the time and I think it's a gradle develocity limitation. If the clean lifecycle is not invoked, then the plugin does not save into the local cache the plugin does not upload the cached artifacts to https://ge.apache.org (it looks like remote caching is not enabled for the project)

@danepitkin
Copy link
Member

Whoops it wasn't a caching issue: #43313

Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 4161898.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants