-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for IPv6 and iceberg with spark >= 3.4 #3206
base: master
Are you sure you want to change the base?
Conversation
Thank you for opening this pull request! 🙌 These tips will help get your PR across the finish line:
|
Code Review Agent Run #3a1fdfActionable Suggestions - 1
Review Details
|
Changelist by BitoThis pull request implements the following key changes.
|
wget https://archive.apache.org/dist/spark/spark-3.2.1/spark-3.2.1-bin-hadoop3.2.tgz -O spark-dist.tgz | ||
echo '224e058cb0c6fb68b39896427a3ccd11ae2246e9bf465b5e29e4fb192d39a59c spark-dist.tgz' | sha256sum --check | ||
wget https://archive.apache.org/dist/spark/spark-3.5.5/spark-3.5.5-bin-hadoop3.tgz -O spark-dist.tgz | ||
echo 'ec5ff678136b1ff981e396d1f7b5dfbf399439c5cb853917e8c954723194857607494a89b7e205fce988ec48b1590b5caeae3b18e1b5db1370c0522b256ff376 spark-dist.tgz' | sha512sum --check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The checksum verification has been updated from SHA-256 to SHA-512, but the command format is incorrect. The SHA-512 checksum string should not be followed by a filename in the echo
command when using sha512sum --check
. Consider removing the filename from the echo command or using the correct format for SHA-512 verification.
Code suggestion
Check the AI-generated fix before applying
echo 'ec5ff678136b1ff981e396d1f7b5dfbf399439c5cb853917e8c954723194857607494a89b7e205fce988ec48b1590b5caeae3b18e1b5db1370c0522b256ff376 spark-dist.tgz' | sha512sum --check | |
echo 'ec5ff678136b1ff981e396d1f7b5dfbf399439c5cb853917e8c954723194857607494a89b7e205fce988ec48b1590b5caeae3b18e1b5db1370c0522b256ff376 spark-dist.tgz' > spark-dist.tgz.sha512 && sha512sum --check spark-dist.tgz.sha512 |
Code Review Run #3a1fdf
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
Signed-off-by: Julian <[email protected]>
Signed-off-by: Julian <[email protected]>
Code Review Agent Run #655175Actionable Suggestions - 0Review Details
|
Code Review Agent Run #46748bActionable Suggestions - 0Review Details
|
Signed-off-by: Julian <[email protected]>
Hi @Future-Outlier, I signed the last commit but one action failed with a timeout on the previous run. integration (ubuntu-latest, 3.9, integration_test_codecov)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run un a flyte cluster to prove it works?
Code Review Agent Run #6b0815Actionable Suggestions - 0Review Details
|
Signed-off-by: Julian <[email protected]>
@Future-Outlier, to test this PR I did run spark tasks on k8s submitted with the build docker image from this PR without the ipv6 hack and it worked as expected. I also added jars for iceberg support and gave the spark user access to the jars dir so one can add jars in the spark config to download at runtime. The image is published at https://hub.docker.com/r/juliastreibel/flyte-spark-plugin. The iceberg tasks also run as expected now. |
Code Review Agent Run #31c59cActionable Suggestions - 1
Review Details
|
Signed-off-by: Julian <[email protected]>
Code Review Agent Run #f98abcActionable Suggestions - 0Review Details
|
Signed-off-by: Julian <[email protected]>
Code Review Agent Run #b3690bActionable Suggestions - 0Review Details
|
Why are the changes needed?
The upgrade to spark >= 3.4 is needed to support IPv6 and iceberg. This is very useful for k8s deployments and is currently breaking our pipelines. We implemented an ugly fix overwriting arguments with ImageSpecs.
Without this we are seeing issues where the ip is not wrapped in [] fixed in
apache/spark#36868
What changes were proposed in this pull request?
Upgrade from spark 3.2.1 to 3.5.5
How was this patch tested?
Ran test of spark plugin successfully
Check all the applicable boxes
Summary by Bito
This PR upgrades Flytekit Spark integration to support Spark 3.4+ with IPv6 and Iceberg support for Kubernetes deployments. Updates include a new Spark base image, revised hadoop-aws dependencies, and modified installation scripts. The PR also fixes file permissions for spark jars directory, locks pyspark version to prevent compatibility issues, and resolves pipeline issues in Kubernetes deployments.Unit tests added: False
Estimated effort to review (1-5, lower is better): 1