Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
3.0.0
-
None
-
None
Description
When insert overwrite into hive table, enabling dynamicPartitionOverwrite when initializing FileCommitProtocol.
HadoopMapReduceCommitProtocol uses FileOutputCommitter to commit job output files.
FileOutputCommitter continues do FileSystem.listStatus for directories in partitions, recursively, and commits job output leaf files.
It is inefficient when dynamically overwritting many partitions and files.
HadoopMapReduceCommitProtocol, when dynamicPartitionOverwrite is enabled, writes to staging dir dynamically, and commits written partition directories, instead of leaf files.
Attachments
Issue Links
- links to