Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-29016 Refactor assembly creation to use only DependencySets and… #6519

Closed
wants to merge 6 commits into from

Conversation

stoty
Copy link
Contributor

@stoty stoty commented Dec 4, 2024

… move cached classpath creation to a new module

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@stoty
Copy link
Contributor Author

stoty commented Dec 5, 2024

I have tested this on a pseudo-distributed cluster, starting and stopping HBase, and running a simple smoke test via HBase shell, and the rowCounter MR job.

<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-build-configuration</artifactId>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better names welcome

Copy link
Contributor

@NihalJain NihalJain Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be hbase-dev-generate-classpath? so that we can prefix all dev related modules as "hbase-dev-*" in case we add more in future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will hold off changing that in case we get other suggestions.

… move cached classpath creation to a new module

also remove explicit jaxsw-ri dependency from assembly
<exclude>org.apache.yetus:audience-annotations</exclude>
<exclude>org.slf4j:*</exclude>
<exclude>org.apache.logging.log4j:*</exclude>
<!-- TODO shouldn't we also exclude duplicate io.opentelemetry.* jars which are added to client-facing-thirdparty ? -->
Copy link
Contributor

@NihalJain NihalJain Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @stoty any reason for not excluding io.opentelemetry.* here?

Copy link
Contributor Author

@stoty stoty Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't 100% sure that the current version doesn't do this on purpose, and I wanted to minimize the changes.

We can add the exclusion here, or we can open a follow-up ticket for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound good either ways

<!-- Exclude the Ruby jar that goes in the lib/ruby directory -->
<exclude>org.jruby:jruby-complete</exclude>
<!-- Exclude jars that go into the lib/client-facing-thirdparty directoy -->
<exclude>com.github.stephenc.findbugs:findbugs-annotations</exclude>
Copy link
Contributor

@NihalJain NihalJain Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the excludes block should be kept in sync with any dependencies we add in a sub folder of lib. may be add a note somewhere to avoid dependencies getting duplicated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

@@ -143,6 +51,36 @@
</files>

<dependencySets>
<dependencySet>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is client.xml and hadoop-three-compat a copy now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
The differences were already minimal, I think they were accidental, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can see if we should keep both as follow up. I am good as long as generated assembly is same

<artifactId>hbase-mapreduce</artifactId>
<type>test-jar</type>
</dependency>
<!-- To dump tools in hbase-procedure into cached_classpath.txt. -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is no longer needed right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah we are referring why we addded hbase-procedure here. ignore comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think that this comment IS redundant, I don't see why hbase-procedure would be different than the rest of the artifacts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that some of the scope changes, etc are also unneccessary in this case, but I ran out steam when cleaning up the pom.

@NihalJain
Copy link
Contributor

NihalJain commented Dec 6, 2024

Is there any difference between tarball generated with this change and without this change? If both are same, a +1 from me, as long as we all are fine to move to use only DependencySets, this is lot cleaner and avoids mixing up.

Although it is important to note that this step comes with a extra maintenance step of excluding any non base folder directory jar. But I am fine with this given we barely add new sub folders and we already plan to add a note in code to notify devs to take care of this.

@stoty
Copy link
Contributor Author

stoty commented Dec 6, 2024

Yes, there are changes,.
I have added a list of the changes with analysis to the JIRA.

About half of them is the jaxws-ri removal, which could be split into a different issue, if desired.

@NihalJain
Copy link
Contributor

NihalJain commented Dec 6, 2024

Yes, there are changes,. I have added a list of the changes with analysis to the JIRA.

About half of them is the jaxws-ri removal, which could be split into a different issue, if desired.

It would prefer to have it separate it it is not too much effort since if we do more this jira should be otherwise "Refactor and cleanup"

Also thinking about the future use case of having assembly with assembly-without-hadoop-jars, have you thought how the dependencySet mechanism will be able to handle it? Would we need to create a new module for assembly-without-hadoop-jars now? As now I am not sure how we can support multiple assemblies with single dependency set without managing inclusion/exclusion lists for each assembly.xml. Or do we plan to follow inclusion/exclusion list approach only.

@stoty
Copy link
Contributor Author

stoty commented Dec 6, 2024

I have also added some reasoning on why we need the assembly changes to the JIRA.

@NihalJain
Copy link
Contributor

I have also added some reasoning on why we need the assembly changes to the JIRA.

Just saw the JIRA title, sounds good to me.

@stoty
Copy link
Contributor Author

stoty commented Dec 6, 2024

For the hadoop-less assembly, I plan to copy this assembly, and add the Hadoop artifacts as provided scope dependencies.

That is the best (only) way I know to remove both the Hadoop artifacts, and their exclusive transitive dependencies, while keeping HBase's transitive dependencies.

The only other way I know would be excluding and including everything by hand, which would be super fragile.

@stoty
Copy link
Contributor Author

stoty commented Dec 6, 2024

Re-added jaxws-ri

@Apache-HBase

This comment has been minimized.

<include>org.apache.hbase:hbase-shaded-client-byo-hadoop</include>
</includes>
<outputDirectory>lib</outputDirectory>
<useTransitiveDependencies>true</useTransitiveDependencies>
Copy link
Contributor

@NihalJain NihalJain Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: <!-- Exclude artifacts added in the sub-directories to avoid duplication -->

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced to the other xml

Copy link
Contributor

@NihalJain NihalJain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

@stoty
Copy link
Contributor Author

stoty commented Dec 6, 2024

I have checked the external and internal classpaths, and determined that duplicating the opentelemtry JARs is not needed.
The shaded classpaths don't care, and the internal classpath had duplicated copies.

@stoty
Copy link
Contributor Author

stoty commented Dec 6, 2024

So I removed the duplication.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@@ -245,20 +211,36 @@
<groupId>io.opentelemetry.javaagent</groupId>
<artifactId>opentelemetry-javaagent</artifactId>
</dependency>
<!-- We don't really add this to assembly tarball, we retain it here just to dump it into
cached_classpath.txt ! See HBASE-28433 for more info. -->
<!-- This is an optional dependency of hbase-external-blockcache.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, let's just remove the optional flag in hbase-external-blockcache's pom file?

Copy link
Contributor Author

@stoty stoty Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took another look at this.

I guess that the original intent was to NOT include it in the assembly, it just got lost somehow.

The last version is also from 2017, so this doesn't look a maintaned library.

Based on the comments, getting this right is far from trivial, and this probably has few users, so it might be better to keep it optional, and just leave it out from assembly.

That way most users will get less JARs, and the few (if any) users of it would have to provide it themselves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can do the removal in another ticket, so that it's better documented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened a [DISCUSS] thread for spymemcached removal.

@Apache9
Copy link
Contributor

Apache9 commented Dec 9, 2024

Let's have a try first.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 35s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+0 🆗 mvndep 0m 30s Maven dependency ordering for branch
+1 💚 mvninstall 4m 19s master passed
+1 💚 compile 10m 12s master passed
+1 💚 spotless 0m 54s branch has no errors when running spotless:check.
-0 ⚠️ patch 1m 9s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 18s Maven dependency ordering for patch
+1 💚 mvninstall 3m 43s the patch passed
+1 💚 compile 9m 33s the patch passed
+1 💚 javac 9m 33s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 xmllint 0m 0s No new issues.
+1 💚 hadoopcheck 13m 19s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 54s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
52m 59s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6519/4/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6519
Optional Tests dupname asflicense javac codespell detsecrets xmllint hadoopcheck spotless compile
uname Linux a29ca77ca737 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 94734e7
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 188 (vs. ulimit of 30000)
modules C: hbase-assembly hbase-dev-generate-classpath . U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6519/4/console
versions git=2.34.1 maven=3.9.8 xmllint=20913
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@stoty
Copy link
Contributor Author

stoty commented Jan 9, 2025

I have already merged this back in December.

@stoty stoty closed this Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants