Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade AWS SDK to V2 #2972

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

lliangyu-lin
Copy link

@lliangyu-lin lliangyu-lin commented Apr 25, 2024

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Storage
  • storageS3DynamoDB

Description

The AWS SDK for Java 1.x is being deprecated will enter maintenance mode on July 31, 2024. The end-of-support is effective December 31, 2025 (Official Announcement Link).
To address the sdk deprecation, we’ll need to upgrade AWS SDK Java 1.x to AWS SDK Java 2.x.
SDK v2 is a major rewrite of the version 1.x code base. For detailed differences, please refer to What's different between the AWS SDK for Java 1.x and 2.x.

List of files in delta main branch that are currently leveraging AWS SDK v1 APIs. These are the files that we need to update for this upgrade.

How was this patch tested?

Unit Test

  • build/sbt storageS3DynamoDB/test: passed
  • build/sbt storage/test: passed
  • build/sbt spark/'testOnly org.apache.spark.sql.delta.coordinatedcommits.*' passed

S3 LogStore Integration Test

run-integration-tests.py --s3-log-store-util-only
[info] - setup empty delta log
[info] - empty
[info] - small
[info] - medium
[info] - large
[info] S3LogStoreUtilTest:
[info] Run completed in 22 seconds, 503 milliseconds.
[info] Total number of tests run: 5
[info] Suites: completed 3, aborted 0
[info] Tests: succeeded 5, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 24 s, completed Apr 23, 2024, 9:36:04 AM

Manual Testing

spark-sql \
--conf spark.delta.logStore.s3a.impl=io.delta.storage.S3DynamoDBLogStore \
--conf spark.io.delta.storage.S3DynamoDBLogStore.ddb.tableName=delta_log1 \
--conf spark.io.delta.storage.S3DynamoDBLogStore.ddb.region=us-east-1 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \
--jars /usr/share/aws/delta/lib/delta-storage-s3-dynamodb.jar
CREATE TABLE my_delta_table_1 (
id INT,
value INT
) USING delta;

INSERT INTO my_delta_table_1
VALUES
(1, 100),
(2, 200),
(3, 300),
(4, 400),
(5, 500),
(6, 600),
(7, 700),
(8, 800),
(9, 900),
(10, 1000);

select * from my_delta_table_1;
6	600
7	700
3	300
4	400
5	500
6	600
7	700
8	800
9	900
10	1000
3	300
4	400
5	500
8	800
9	900
10	1000
1	100
2	200
1	100
2	200
Time taken: 1.175 seconds, Fetched 20 row(s)

Does this PR introduce any user-facing changes?

Yes, users will need to specify the SDK V2 credential provider instead of SDK V1 for delta configurations
Ex: io.delta.storage.credentials.provider=com.amazonaws.auth.profile.ProfileCredentialsProvider -> software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider

Closes: #3556

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade AWS Java SDK to v2
1 participant