Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-19019

Parallel Maven Build Support for Apache Hadoop

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.5.0
    • build
    • Reviewed
    • Support to parallel building with maven command, such as `mvn -T 8 clean package -DskipTests`.

    Description

      The reason for the slow compilation: The Hadoop project has many modules, and the inability to compile them in parallel results in a slow process. For instance, the first compilation of Hadoop might take several hours, and even with local Maven dependencies, a subsequent compilation can still take close to 40 minutes, which is very slow.

      How to solve it: Use mvn dependency:tree and maven-to-plantuml to investigate the dependency issues that prevent parallel compilation.

      • Investigate the dependencies between project modules.
      • Analyze the dependencies in multi-module Maven projects.
      • Download maven-to-plantuml:

       
      wget https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar

      • Generate a dependency tree:

       
      mvn dependency:tree > dep.txt

      • Generate a UML diagram from the dependency tree:

       
      java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml

      For more information, visit: maven-to-plantuml GitHub repository.

       

      Hadoop Parallel Compilation Submission Logic

      1. Reasons for Parallel Compilation Failure
        • In sequential compilation, as modules are compiled one by one in order, there are no errors because the compilation follows the module sequence.
        • However, in parallel compilation, all modules are compiled simultaneously. The compilation order during multi-module concurrent compilation depends on the inter-module dependencies. If Module A depends on Module B, then Module B will be compiled before Module A. This ensures that the compilation order follows the dependencies between modules.
          But when Hadoop compiles in parallel, for example, compiling hadoop-yarn-project, the dependencies between modules are correct. The issue arises during the dist package stage. dist packages all other compiled modules.

      Behavior of hadoop-yarn-project in Serial Compilation:

        • In serial compilation, it compiles modules in the pom one by one in sequence. After all modules are compiled, it compiles hadoop-yarn-project. During the prepare-package stage, the maven-assembly-plugin plugin is executed for packaging. All packages are repackaged according to the description in hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml.
          Behavior of hadoop-yarn-project in Parallel Compilation:
        • Parallel compilation compiles modules according to the dependency order among them. If modules do not declare dependencies on each other through dependency, they are compiled in parallel. According to the dependency definition in the pom of hadoop-yarn-project, the dependencies are compiled first, followed by hadoop-yarn-project, executing its maven-assembly-plugin.
        • However, the files needed for packaging in hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml are not all included in the dependency of hadoop-yarn-project. Therefore, when compiling hadoop-yarn-project and executing maven-assembly-plugin, not all required modules are built yet, leading to errors in parallel compilation.
          Solution:
        • The solution is relatively straightforward: organize all modules from hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml, and then declare them as dependencies in the pom of hadoop-yarn-project.

      Attachments

        1. patch11-HDFS-17287.diff
          5 kB
          caijialiang

        Issue Links

          Activity

            People

              jialiang caijialiang
              jialiang caijialiang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: