Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEZ-4520: Enable Parallel Compilation for TEZ #315

Merged
merged 1 commit into from
Jan 4, 2024

Conversation

JiaLiangC
Copy link
Contributor

Tez was unable to utilize Maven's multi-threaded compilation due to dependency issues, resulting in slow compilation. However, this problem was resolved through module dependency resolution. After implementing parallel compilation with Maven, the compilation speed increased several times over.

image

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 23m 24s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+1 💚 mvninstall 18m 8s master passed
+1 💚 compile 0m 31s master passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu122.04
+1 💚 compile 0m 30s master passed with JDK Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
+1 💚 javadoc 1m 2s master passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu122.04
+1 💚 javadoc 0m 26s master passed with JDK Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
_ Patch Compile Tests _
+1 💚 mvninstall 0m 14s the patch passed
+1 💚 compile 0m 14s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu122.04
+1 💚 javac 0m 14s the patch passed
+1 💚 compile 0m 13s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
+1 💚 javac 0m 13s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 javadoc 0m 9s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu122.04
+1 💚 javadoc 0m 9s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
_ Other Tests _
+1 💚 unit 0m 13s tez-dist in the patch passed.
+1 💚 asflicense 0m 20s The patch does not generate ASF License warnings.
46m 37s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-315/1/artifact/out/Dockerfile
GITHUB PR #315
JIRA Issue TEZ-4520
Optional Tests dupname asflicense javac javadoc unit xml compile
uname Linux f8476b8bfdc6 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 51d6f53
Default Java Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu122.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-315/1/testReport/
Max. process+thread count 77 (vs. ulimit of 5500)
modules C: tez-dist U: tez-dist
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-315/1/console
versions git=2.34.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@abstractdog
Copy link
Contributor

thanks @JiaLiangC for this patch!
what's the exact maven compile command that's slow without this patch? is it simply about turning on -T2C?
what exactly proves that due to dependency issues, the maven build falls back to single-threaded mode? (I guess it doesn't fail but finishes slowly)

@ayushtkn
Copy link
Member

ayushtkn commented Dec 4, 2023

It fails

[INFO] Reactor Summary for tez 0.10.3-SNAPSHOT:
[INFO] 
[INFO] tez ................................................ SUCCESS [  1.291 s]
[INFO] hadoop-shim ........................................ SUCCESS [  1.528 s]
[INFO] tez-api ............................................ SUCCESS [  4.666 s]
[INFO] tez-build-tools .................................... SUCCESS [  0.412 s]
[INFO] tez-common ......................................... SUCCESS [  1.119 s]
[INFO] tez-runtime-internals .............................. SUCCESS [  1.736 s]
[INFO] tez-runtime-library ................................ SUCCESS [  3.070 s]
[INFO] tez-mapreduce ...................................... SUCCESS [  2.239 s]
[INFO] tez-examples ....................................... SUCCESS [  1.010 s]
[INFO] tez-dag ............................................ SUCCESS [  4.501 s]
[INFO] tez-tests .......................................... SUCCESS [  2.003 s]
[INFO] tez-ext-service-tests .............................. SUCCESS [  2.059 s]
[INFO] tez-ui ............................................. SUCCESS [ 20.937 s]
[INFO] tez-plugins ........................................ SUCCESS [  0.121 s]
[INFO] tez-protobuf-history-plugin ........................ SUCCESS [  1.437 s]
[INFO] tez-yarn-timeline-history .......................... SUCCESS [  1.858 s]
[INFO] tez-yarn-timeline-history-with-acls ................ SUCCESS [  1.838 s]
[INFO] tez-yarn-timeline-cache-plugin ..................... SUCCESS [ 10.969 s]
[INFO] tez-yarn-timeline-history-with-fs .................. SUCCESS [  1.870 s]
[INFO] tez-history-parser ................................. SUCCESS [  8.592 s]
[INFO] tez-aux-services ................................... SUCCESS [  6.105 s]
[INFO] tez-tools .......................................... SUCCESS [  0.121 s]
[INFO] tez-perf-analyzer .................................. SUCCESS [  0.420 s]
[INFO] tez-job-analyzer ................................... SKIPPED
[INFO] tez-javadoc-tools .................................. SUCCESS [  0.895 s]
[INFO] hadoop-shim-impls .................................. SUCCESS [  0.121 s]
[INFO] hadoop-shim-2.8 .................................... SUCCESS [  0.772 s]
[INFO] tez-dist ........................................... FAILURE [  1.252 s]
[INFO] Tez ................................................ SUCCESS [  0.367 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  29.025 s (Wall Clock)
[INFO] Finished at: 2023-12-04T13:36:45+05:30
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:3.2.0:single (package-tez) on project tez-dist: Failed to create assembly: Artifact: org.apache.tez:tez-job-analyzer:jar:0.10.3-SNAPSHOT (included by module) does not have an artifact with a file. Please ensure the package phase is run before the assembly is generated. -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :tez-dist
ayushsaxena@ayushsaxena tez % 

@JiaLiangC
Copy link
Contributor Author

It fails

[INFO] Reactor Summary for tez 0.10.3-SNAPSHOT:
[INFO] 
[INFO] tez ................................................ SUCCESS [  1.291 s]
[INFO] hadoop-shim ........................................ SUCCESS [  1.528 s]
[INFO] tez-api ............................................ SUCCESS [  4.666 s]
[INFO] tez-build-tools .................................... SUCCESS [  0.412 s]
[INFO] tez-common ......................................... SUCCESS [  1.119 s]
[INFO] tez-runtime-internals .............................. SUCCESS [  1.736 s]
[INFO] tez-runtime-library ................................ SUCCESS [  3.070 s]
[INFO] tez-mapreduce ...................................... SUCCESS [  2.239 s]
[INFO] tez-examples ....................................... SUCCESS [  1.010 s]
[INFO] tez-dag ............................................ SUCCESS [  4.501 s]
[INFO] tez-tests .......................................... SUCCESS [  2.003 s]
[INFO] tez-ext-service-tests .............................. SUCCESS [  2.059 s]
[INFO] tez-ui ............................................. SUCCESS [ 20.937 s]
[INFO] tez-plugins ........................................ SUCCESS [  0.121 s]
[INFO] tez-protobuf-history-plugin ........................ SUCCESS [  1.437 s]
[INFO] tez-yarn-timeline-history .......................... SUCCESS [  1.858 s]
[INFO] tez-yarn-timeline-history-with-acls ................ SUCCESS [  1.838 s]
[INFO] tez-yarn-timeline-cache-plugin ..................... SUCCESS [ 10.969 s]
[INFO] tez-yarn-timeline-history-with-fs .................. SUCCESS [  1.870 s]
[INFO] tez-history-parser ................................. SUCCESS [  8.592 s]
[INFO] tez-aux-services ................................... SUCCESS [  6.105 s]
[INFO] tez-tools .......................................... SUCCESS [  0.121 s]
[INFO] tez-perf-analyzer .................................. SUCCESS [  0.420 s]
[INFO] tez-job-analyzer ................................... SKIPPED
[INFO] tez-javadoc-tools .................................. SUCCESS [  0.895 s]
[INFO] hadoop-shim-impls .................................. SUCCESS [  0.121 s]
[INFO] hadoop-shim-2.8 .................................... SUCCESS [  0.772 s]
[INFO] tez-dist ........................................... FAILURE [  1.252 s]
[INFO] Tez ................................................ SUCCESS [  0.367 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  29.025 s (Wall Clock)
[INFO] Finished at: 2023-12-04T13:36:45+05:30
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:3.2.0:single (package-tez) on project tez-dist: Failed to create assembly: Artifact: org.apache.tez:tez-job-analyzer:jar:0.10.3-SNAPSHOT (included by module) does not have an artifact with a file. Please ensure the package phase is run before the assembly is generated. -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :tez-dist
ayushsaxena@ayushsaxena tez % 

@ayushtkn @abstractdog
If you compile directly in parallel, it will report the error you posted. This is because the job analyzer hasn't been properly compiled before packaging with dist. Therefore, it needs to be added to the dependencies of dist.

Applying this patch should resolve the issue.
image

@JiaLiangC
Copy link
Contributor Author

@abstractdog @ayushtkn
I apologize for not providing a detailed explanation earlier due to being too busy:

The command mvn clean package -Dtar -Dhadoop.version=${HADOOP_VERSION} -Phadoop28 -DskipTests compiles without errors. However, parallel compilation fails, and the failure error is as posted by "ayushtkn" in last comment.

The reason for the failure is that parallel compilation attempts to compile all modules in a single pass, and the compilation order is determined as a tree structure based on dependencies.

In the normal sequential compilation, tez-dist is the last module to compile, so before compiling this module, all other modules and their dependencies are already compiled. However, in parallel compilation, tez-dist is compiled concurrently with other modules. Therefore, it is necessary to ensure that tez-dist includes dependencies from all other modules. This ensures that tez-dist is compiled only after all other modules have been compiled.

To resolve the issue using the command mvn -T2C clean package -Dtar -Dhadoop.version=${HADOOP_VERSION} -Phadoop28 -DskipTests, the error reported is due to tez-dist missing the tez-job-analyzer dependency. Adding this dependency will prevent errors during parallel compilation.

I also recommend using mvn dependency:tree and maven-to-plantuml to troubleshoot dependency issues causing the inability to compile in parallel.

To investigate the dependency relationships between project modules, you can use the following tools:

mvn dependency:tree: This Maven command generates a tree structure of project dependencies. It can help you visualize the dependencies between modules.

maven-to-plantuml: This is a project for analyzing and visualizing Maven project dependencies. You can use it to generate a PlantUML diagram from the dependency tree generated by mvn dependency:tree. Here are the steps:

Download maven-to-plantuml: wget https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar
Generate the dependency tree: mvn dependency:tree > dep.txt
Generate a PlantUML diagram from the dependency tree: java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml
These tools will help you analyze the dependencies between modules in your project and identify any issues that may be causing problems during parallel compilation.

@abstractdog
Copy link
Contributor

confirmed locally, this change makes the -T2C compilation work

@abstractdog abstractdog self-requested a review January 4, 2024 15:25
Copy link
Contributor

@abstractdog abstractdog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@abstractdog abstractdog merged commit 2dcdeba into apache:master Jan 4, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants