-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support gcs-connector 3.x in GcsUtil #33368
Conversation
GoogleCloudStorage get( | ||
GoogleCloudStorageOptions options, | ||
Storage storage, | ||
Credentials credentials, | ||
HttpRequestInitializer httpRequestInitializer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 4 params should cover both the 2.x constructor and the 3.x Builder
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
cc @Abacn - wdyt of this workaround for gcs-connector 3.x? |
Hi, thanks for the investigation. Is the builder constructor also supported on 2.x ? If so we can just change to use it in all case and no need extra options exposed to user |
Unfortunately it isn't :/ There is no way to construct a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
understand, thanks!
thanks @Abacn ! should I update CHANGES.md for the |
java precommit tineout for several rerun. It's passing on HEAD, could you please taking a look if it is related to this change? |
after multiple rerun there is indeed a related test failure: https://github.com/apache/beam/runs/34888346578
looks like it is coded in test to prevent the exposure of unwanted classes. Refactor in a way that does not leak these may fix |
thanks for looking into it @Abacn ! hmm, this seems challenging to refactor since the exposure is coming from |
hey @Abacn ! just wanted to ping this thread before the PR goes stale :) |
a last resort would be using reflection to handle both constructor or builder. Say, use a try to encapsulate the constictor call, when using 3.x dependency ut will throw an error, catch that Exception and construct the instance using Reflection. Is that possible to do? |
I think that makes sense... so we'd keep the official Beam dep on 2.x. but try to reflectively construct a 3.x instance first? |
f343641
to
ca8f9c0
Compare
seems to work! tested it on 2.x and 3.x 👍 |
return new GoogleCloudStorageImpl(options, storage, credentials); | ||
try { | ||
// Attempt to construct gcs-connector 3.x-style GoogleCloudStorage, which is created | ||
// exclusively via Builder method; this can be replaced once Java 8 is dropped and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could add a // TODO tag like
// TODO eliminate reflection once Beam drops Java 8 support and upgrade to use gcsio 3.x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea- updated 👍
there is an integration failure but it doesn't look related:
|
Sorry for another late request I've been a little bit worry about relying on the since using Reflection is already a hack, I do not see a good solution to this either. But could we try {return new GoogleCloudStorageImpl(...)} first? So in the default dependency versions (gcs 2.x), it won't throw Exception (even though it is handled) after this change |
that makes sense @Abacn ! Updated so that we try to the 2.x constructor first and attempt the 3.x Builder only if that throws a |
looks like there was a spotbugs error - looking into it now.. |
yeah, can be found in "SpotBugs Results": https://github.com/apache/beam/actions/runs/12768890245?pr=33368
|
|
...oogle-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java
Outdated
Show resolved
Hide resolved
* Parameterize GoogleCloudStorage provider in GcsUtil to unblock gcs-connector 3.x * Use Reflection to attempt 3.x Builder, and fall back to 2.x Constructor * Attempt constructing 3.x-style GCS via reflection; fall back to 2.x constructor * Update CHANGES.md * Add TODO note on reflection block * Try non-reflected construction first * Fix SpotBugs error
Rationale:
I would like to use gcs-connector 3.x, which supports the new Parquet VectorIO feature. However, gcs-connector 3.x also drops Java 8 and targets Java 11, which blocks us from upgrading it directly in Beam, since Beam is still targeting 8 (see #31678).
Additionally, as a Beam user, I can't just upgrade gcs-connector on my end, due to breaking changes in how
GoogleCloudStorageImpl
is instantiated: in 2.x it has public constructors, but in 3.x it drops the public constructors and enforces a Builder pattern.Therefore, when running on gcs-connector 3.x, my pipeline throws a NoSuchMethodError from
org.apache.beam.sdk.extensions.gcp.util.GcsUtil
when it tries to invoke the 2.x constructor: https://github.com/apache/beam/blob/v2.61.0/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java#L727This PR adds a pipeline option for a GoogleCloudStorage Provider, so that users who want to use gcs-connector 3.x can be unblocked from doing so. It defaults to invoking the gcs-connector 2.x public constructor, but 3.x users can override it to use the Builder.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.