Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-45894

hive table level setting hadoop.mapred.max.split.size

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.5.0
    • 3.5.0
    • SQL

    Description

      In the scenario of hive table scan, by configuring the hadoop.mapred.max.split.size parameter, you can increase the parallelism of the scan hive table stage, thereby reducing the running time.

      However, if a large table and a small table are in the same query, if only a separate hadoop.mapred.max.split.size parameter is configured, some stages will run a very large number of tasks, and some stages will The number of tasks running is very small. For runtime tasks, the hadoop.mapred.max.split.size parameter can be set separately for each hive table to ensure this balance.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              guihuawen guihuawen
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: