-
Notifications
You must be signed in to change notification settings - Fork 425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEZ-4520: Enable Parallel Compilation for TEZ #315
Conversation
💔 -1 overall
This message was automatically generated. |
thanks @JiaLiangC for this patch! |
It fails
|
@ayushtkn @abstractdog |
@abstractdog @ayushtkn The command mvn clean package -Dtar -Dhadoop.version=${HADOOP_VERSION} -Phadoop28 -DskipTests compiles without errors. However, parallel compilation fails, and the failure error is as posted by "ayushtkn" in last comment. The reason for the failure is that parallel compilation attempts to compile all modules in a single pass, and the compilation order is determined as a tree structure based on dependencies. In the normal sequential compilation, tez-dist is the last module to compile, so before compiling this module, all other modules and their dependencies are already compiled. However, in parallel compilation, tez-dist is compiled concurrently with other modules. Therefore, it is necessary to ensure that tez-dist includes dependencies from all other modules. This ensures that tez-dist is compiled only after all other modules have been compiled. To resolve the issue using the command mvn -T2C clean package -Dtar -Dhadoop.version=${HADOOP_VERSION} -Phadoop28 -DskipTests, the error reported is due to tez-dist missing the tez-job-analyzer dependency. Adding this dependency will prevent errors during parallel compilation. I also recommend using mvn dependency:tree and maven-to-plantuml to troubleshoot dependency issues causing the inability to compile in parallel. To investigate the dependency relationships between project modules, you can use the following tools: mvn dependency:tree: This Maven command generates a tree structure of project dependencies. It can help you visualize the dependencies between modules. maven-to-plantuml: This is a project for analyzing and visualizing Maven project dependencies. You can use it to generate a PlantUML diagram from the dependency tree generated by mvn dependency:tree. Here are the steps: Download maven-to-plantuml: wget https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar |
confirmed locally, this change makes the -T2C compilation work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Tez was unable to utilize Maven's multi-threaded compilation due to dependency issues, resulting in slow compilation. However, this problem was resolved through module dependency resolution. After implementing parallel compilation with Maven, the compilation speed increased several times over.