Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
FileTablespace::commitOutputData has some problems as following:
First, it is too long and complexed because it handles various cases in a single method. We need to refactor this method into several small and well-defined methods.
Second, FileSystem::listStatus is widely used while committing output data. It especially causes a lot of overhead to list partitioned directories in S3. Also it occurs in HDFS too with large partitioned tables. We need to minimize its usage.