Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29506

Use dynamicPartitionOverwrite in FileCommitProtocol when insert into hive table

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 3.0.0
    • None
    • SQL
    • None

    Description

      When insert overwrite into hive table, enabling dynamicPartitionOverwrite when initializing FileCommitProtocol.

      HadoopMapReduceCommitProtocol uses FileOutputCommitter to commit job output files.

      FileOutputCommitter continues do FileSystem.listStatus for directories in partitions, recursively, and commits job output leaf files.

      It is inefficient when dynamically overwritting many partitions and files.

      HadoopMapReduceCommitProtocol, when dynamicPartitionOverwrite is enabled, writes to staging dir dynamically, and commits written partition directories, instead of leaf files.

      Attachments

        Issue Links

          Activity

            People

              viirya L. C. Hsieh
              viirya L. C. Hsieh
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: