Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make delta-lake shim dependencies parametrizable [databricks] #11697

Open
wants to merge 22 commits into
base: branch-24.12
Choose a base branch
from

Conversation

gerashegalov
Copy link
Collaborator

@gerashegalov gerashegalov commented Nov 6, 2024

Introduce properties in the parent pom that Spark shim profiles can override to specify the set of delta-lake shims for a particular Spark shim.

Add a single reusable array of delta-lake shim dependencies in the aggregator pom. Relies on Maven deduping dependencies.

Drop a verbose mirror of the Spark release profiles from the aggregator pom

Fix ./build/make-scala-version-build-files.sh that currently can silently fail without fully processing poms

Context #11692 (comment)

gerashegalov and others added 14 commits November 4, 2024 11:17
Signed-off-by: Gera Shegalov <[email protected]>
- cleanup aggregator

Signed-off-by: Gera Shegalov <[email protected]>
- create a parametrizable delta-lake dependency
- use profiles only to add extra delta-lake dependencies

Signed-off-by: Gera Shegalov <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
@gerashegalov
Copy link
Collaborator Author

build

Signed-off-by: Gera Shegalov <[email protected]>
@gerashegalov gerashegalov changed the title Make delta-lake shim dependencies parametrizable Make delta-lake shim dependencies parametrizable [databricks] Nov 6, 2024
@gerashegalov
Copy link
Collaborator Author

build

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice code reduction. My main concern with this change is that it makes it easy to accidentally drop Delta Lake support since it defaults to the stub. Stub should be the exception, not the norm, but this makes it "work" when someone forgets to set it. I'd rather see the build fail if the Delta Lake version was not explicitly set so it's a conscious decision to use the stub and not a case of oversight or accidental merge conflict. For example, if we don't reconcile this with #11692 then the 3.4.4 shim will silently drop Delta Lake support without the build complaining. That's not ideal.

@gerashegalov
Copy link
Collaborator Author

Added a check for a proper override

15:12:07,938 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.5.0:enforce (enforce-maven) on project rapids-4-spark-parent_2.13: 
15:12:07,938 [ERROR] Rule 1: org.apache.maven.enforcer.rules.property.RequireProperty failed with message:
15:12:07,938 [ERROR] At least one of rapids.delta.artifactId1, rapids.delta.artifactId2 ... is required in the POM profile "release344"

gerashegalov and others added 2 commits November 6, 2024 21:44
@gerashegalov
Copy link
Collaborator Author

build

jlowe
jlowe previously approved these changes Nov 7, 2024
@gerashegalov
Copy link
Collaborator Author

build

3 similar comments
@jlowe
Copy link
Member

jlowe commented Nov 7, 2024

build

@gerashegalov
Copy link
Collaborator Author

build

@gerashegalov
Copy link
Collaborator Author

build

@pxLi
Copy link
Collaborator

pxLi commented Nov 8, 2024


[2024-11-07T20:54:31.725Z] + echo 'Done with installation of Databricks dependencies, removing /tmp/install-databricks-deps-IdpEJ2-pom.xml'

[2024-11-07T20:54:31.725Z] Done with installation of Databricks dependencies, removing /tmp/install-databricks-deps-IdpEJ2-pom.xml

[2024-11-07T20:54:31.725Z] + rm /tmp/install-databricks-deps-IdpEJ2-pom.xml

[2024-11-07T20:54:31.725Z] + [[ '' == \1 ]]

[2024-11-07T20:54:31.725Z] + MVN_PHASES='clean package'

[2024-11-07T20:54:31.725Z] + mvn -Dmaven.wagon.http.retryHandler.count=3 -B -Ddatabricks -Dbuildver=332db clean package -DskipTests

[2024-11-07T20:54:32.656Z] [INFO] Scanning for projects...

[2024-11-07T20:54:32.656Z] [ERROR] [ERROR] Some problems were encountered while processing the POMs:

[2024-11-07T20:54:32.656Z] [ERROR] 'dependencies.dependency.artifactId' for com.nvidia:rapids-4-spark-delta-${spark.version.classifer}_2.12:jar:spark332db with value 'rapids-4-spark-delta-${spark.version.classifer}_2.12' does not match a valid id pattern. @ line 80, column 25

[2024-11-07T20:54:32.656Z] [ERROR] 'dependencies.dependency.artifactId' for com.nvidia:rapids-4-spark-delta-${spark.version.classifer}_2.12:jar:spark332db with value 'rapids-4-spark-delta-${spark.version.classifer}_2.12' does not match a valid id pattern. @ line 86, column 25

[2024-11-07T20:54:32.656Z] [ERROR] 'dependencies.dependency.artifactId' for com.nvidia:rapids-4-spark-delta-${spark.version.classifer}_2.12:jar:spark332db with value 'rapids-4-spark-delta-${spark.version.classifer}_2.12' does not match a valid id pattern. @ line 92, column 25

[2024-11-07T20:54:32.656Z] [WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found duplicate declaration of plugin org.apache.maven.plugins:maven-antrun-plugin @ line 521, column 21

[2024-11-07T20:54:32.656Z]  @ 

[2024-11-07T20:54:32.656Z] [ERROR] The build could not read 1 project -> [Help 1]

[2024-11-07T20:54:32.656Z] [ERROR]   

[2024-11-07T20:54:32.656Z] [ERROR]   The project com.nvidia:rapids-4-spark-aggregator_2.12:24.12.0-SNAPSHOT (/home/ubuntu/spark-rapids/aggregator/pom.xml) has 3 errors

[2024-11-07T20:54:32.656Z] [ERROR]     'dependencies.dependency.artifactId' for com.nvidia:rapids-4-spark-delta-${spark.version.classifer}_2.12:jar:spark332db with value 'rapids-4-spark-delta-${spark.version.classifer}_2.12' does not match a valid id pattern. @ line 80, column 25

[2024-11-07T20:54:32.656Z] [ERROR]     'dependencies.dependency.artifactId' for com.nvidia:rapids-4-spark-delta-${spark.version.classifer}_2.12:jar:spark332db with value 'rapids-4-spark-delta-${spark.version.classifer}_2.12' does not match a valid id pattern. @ line 86, column 25

[2024-11-07T20:54:32.656Z] [ERROR]     'dependencies.dependency.artifactId' for com.nvidia:rapids-4-spark-delta-${spark.version.classifer}_2.12:jar:spark332db with value 'rapids-4-spark-delta-${spark.version.classifer}_2.12' does not match a valid id pattern. @ line 92, column 25

[2024-11-07T20:54:32.656Z] [ERROR] 

[2024-11-07T20:54:32.656Z] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.

[2024-11-07T20:54:32.656Z] [ERROR] Re-run Maven using the -X switch to enable full debug logging.

Signed-off-by: Gera Shegalov <[email protected]>
@gerashegalov
Copy link
Collaborator Author

build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants