Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
After insert, hive gathers statistics for ACID table and that becomes expensive over time, due to number of delta folders and scanning .
public static List<FileStatus> getAcidFilesForStats( Table table, Path dir, Configuration jc, FileSystem fs) throws IOException { ... Directory acidInfo = AcidUtils.getAcidState(fs, dir, jc, idList, null, false, hdfsDirSnapshots); ... ..+ other calls ... }
Runtime keeps increasing as more deltas are generated.
Attachments
Issue Links
- is fixed by
-
HIVE-23791 Optimize ACID stats generation
- Closed