Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for IPv6 and iceberg with spark >= 3.4 #3206

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

julianStreibel
Copy link

@julianStreibel julianStreibel commented Mar 21, 2025

Why are the changes needed?

The upgrade to spark >= 3.4 is needed to support IPv6 and iceberg. This is very useful for k8s deployments and is currently breaking our pipelines. We implemented an ugly fix overwriting arguments with ImageSpecs.
Without this we are seeing issues where the ip is not wrapped in [] fixed in
apache/spark#36868

What changes were proposed in this pull request?

Upgrade from spark 3.2.1 to 3.5.5

How was this patch tested?

Ran test of spark plugin successfully

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Summary by Bito

This PR upgrades Flytekit Spark integration to support Spark 3.4+ with IPv6 and Iceberg support for Kubernetes deployments. Updates include a new Spark base image, revised hadoop-aws dependencies, and modified installation scripts. The PR also fixes file permissions for spark jars directory, locks pyspark version to prevent compatibility issues, and resolves pipeline issues in Kubernetes deployments.

Unit tests added: False

Estimated effort to review (1-5, lower is better): 1

Sorry, something went wrong.

Copy link

welcome bot commented Mar 21, 2025

Thank you for opening this pull request! 🙌

These tips will help get your PR across the finish line:

  • Most of the repos have a PR template; if not, fill it out to the best of your knowledge.
  • Sign off your commits (Reference: DCO Guide).

@flyte-bot
Copy link
Contributor

flyte-bot commented Mar 21, 2025

Code Review Agent Run #3a1fdf

Actionable Suggestions - 1
  • plugins/flytekit-spark/scripts/flytekit_install_spark3.sh - 1
    • Incorrect SHA-512 checksum verification format · Line 26-26
Review Details
  • Files reviewed - 1 · Commit Range: a0694d5..a0694d5
    • plugins/flytekit-spark/scripts/flytekit_install_spark3.sh
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

Refer to the documentation for additional commands.

Configuration

This repository uses code_review_bito You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

@flyte-bot
Copy link
Contributor

flyte-bot commented Mar 21, 2025

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
Feature Improvement - Spark Upgrade and Dependency Improvements

Dockerfile - Updated base image from Spark 3.3.1 to 3.4.0, revised Maven dependency URLs to fetch hadoop-aws 3.4.0 and added Iceberg jars, and fixed jar directory permissions.

flytekit_install_spark3.sh - Modified installation commands to download Spark 3.4.0 and updated dependency downloads for hadoop-aws and aws-java-sdk-bundle with new version references and checksums.

setup.py - Locked the pyspark version requirement to 3.4.0 to ensure compatibility with the upgraded Spark environment.

wget https://archive.apache.org/dist/spark/spark-3.2.1/spark-3.2.1-bin-hadoop3.2.tgz -O spark-dist.tgz
echo '224e058cb0c6fb68b39896427a3ccd11ae2246e9bf465b5e29e4fb192d39a59c spark-dist.tgz' | sha256sum --check
wget https://archive.apache.org/dist/spark/spark-3.5.5/spark-3.5.5-bin-hadoop3.tgz -O spark-dist.tgz
echo 'ec5ff678136b1ff981e396d1f7b5dfbf399439c5cb853917e8c954723194857607494a89b7e205fce988ec48b1590b5caeae3b18e1b5db1370c0522b256ff376 spark-dist.tgz' | sha512sum --check
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect SHA-512 checksum verification format

The checksum verification has been updated from SHA-256 to SHA-512, but the command format is incorrect. The SHA-512 checksum string should not be followed by a filename in the echo command when using sha512sum --check. Consider removing the filename from the echo command or using the correct format for SHA-512 verification.

Code suggestion
Check the AI-generated fix before applying
Suggested change
echo 'ec5ff678136b1ff981e396d1f7b5dfbf399439c5cb853917e8c954723194857607494a89b7e205fce988ec48b1590b5caeae3b18e1b5db1370c0522b256ff376 spark-dist.tgz' | sha512sum --check
echo 'ec5ff678136b1ff981e396d1f7b5dfbf399439c5cb853917e8c954723194857607494a89b7e205fce988ec48b1590b5caeae3b18e1b5db1370c0522b256ff376 spark-dist.tgz' > spark-dist.tgz.sha512 && sha512sum --check spark-dist.tgz.sha512

Code Review Run #3a1fdf


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them

Sorry, something went wrong.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Signed-off-by: Julian <[email protected]>
Signed-off-by: Julian <[email protected]>
@flyte-bot
Copy link
Contributor

flyte-bot commented Mar 21, 2025

Code Review Agent Run #655175

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: 91b3466..f4c0bbe
    • plugins/flytekit-spark/scripts/flytekit_install_spark3.sh
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

Refer to the documentation for additional commands.

Configuration

This repository uses code_review_bito You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

@julianStreibel julianStreibel changed the title Add support for ipv6 with spark >= 3.4 Add support for IPv6 with spark >= 3.4 Mar 22, 2025
@julianStreibel julianStreibel changed the title Add support for IPv6 with spark >= 3.4 Add support for IPv6 and iceberg with spark >= 3.4 Mar 23, 2025
@flyte-bot
Copy link
Contributor

flyte-bot commented Mar 23, 2025

Code Review Agent Run #46748b

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: f4c0bbe..d0184c5
    • plugins/flytekit-spark/Dockerfile
    • plugins/flytekit-spark/scripts/flytekit_install_spark3.sh
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

Refer to the documentation for additional commands.

Configuration

This repository uses code_review_bito You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

Signed-off-by: Julian <[email protected]>
@julianStreibel
Copy link
Author

Hi @Future-Outlier, I signed the last commit but one action failed with a timeout on the previous run.

integration (ubuntu-latest, 3.9, integration_test_codecov)
failed 53 minutes ago in 1h 22m 37s

SSH: ssh [email protected]
or: ssh -i <path-to-private-SSH-key> [email protected]
SSH: ssh [email protected]
or: ssh -i <path-to-private-SSH-key> [email protected]
Error: The action 'Setup tmate session' has timed out after 60 minutes.

Copy link
Member

@Future-Outlier Future-Outlier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run un a flyte cluster to prove it works?

@flyte-bot
Copy link
Contributor

flyte-bot commented Mar 24, 2025

Code Review Agent Run #6b0815

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: 91b3466..5e75d32
    • plugins/flytekit-spark/Dockerfile
    • plugins/flytekit-spark/scripts/flytekit_install_spark3.sh
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

Refer to the documentation for additional commands.

Configuration

This repository uses code_review_bito You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

Signed-off-by: Julian <[email protected]>
@julianStreibel
Copy link
Author

julianStreibel commented Mar 25, 2025

@Future-Outlier, to test this PR I did run spark tasks on k8s submitted with the build docker image from this PR without the ipv6 hack and it worked as expected. I also added jars for iceberg support and gave the spark user access to the jars dir so one can add jars in the spark config to download at runtime. The image is published at https://hub.docker.com/r/juliastreibel/flyte-spark-plugin. The iceberg tasks also run as expected now.

@flyte-bot
Copy link
Contributor

flyte-bot commented Mar 25, 2025

Code Review Agent Run #31c59c

Actionable Suggestions - 1
  • plugins/flytekit-spark/Dockerfile - 1
    • Consider matching Hadoop and Spark versions · Line 15-18
Review Details
  • Files reviewed - 2 · Commit Range: 5e75d32..db4d9e6
    • plugins/flytekit-spark/Dockerfile
    • plugins/flytekit-spark/setup.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

Refer to the documentation for additional commands.

Configuration

This repository uses code_review_bito You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

Signed-off-by: Julian <[email protected]>
@flyte-bot
Copy link
Contributor

flyte-bot commented Mar 26, 2025

Code Review Agent Run #f98abc

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: db4d9e6..f60ee18
    • plugins/flytekit-spark/Dockerfile
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

Refer to the documentation for additional commands.

Configuration

This repository uses code_review_bito You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

Signed-off-by: Julian <[email protected]>
@julianStreibel julianStreibel requested a review from pingsutw March 26, 2025 12:19
@flyte-bot
Copy link
Contributor

flyte-bot commented Mar 26, 2025

Code Review Agent Run #b3690b

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: f60ee18..a2c1565
    • plugins/flytekit-spark/setup.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

Refer to the documentation for additional commands.

Configuration

This repository uses code_review_bito You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants