Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
HBASE-10958 Call Flush before BulkLoad to obtain the latest sequenceID to prevent data loss during replay. 'hbase.mapreduce.bulkload.assign.sequenceNumbers' controls whether to flush before BulkLoad, but we pass true to whether to flush in SecureBulkLoadManager. If we bulkload frequently we flush a lot of small files. Can we make 'hbase.mapreduce.bulkload.assign.sequenceNumbers' work in SecureBulkLoadManager? This passes -1 to sequenceId, we won't loss data.
SecureBulkLoadManager.java.
secureBulkLoadHFiles
// code placeholder return region.bulkLoadHFiles(familyPaths, true, new SecureBulkLoadListener(fs, bulkToken, conf), request.getCopyFile(), clusterIds, request.getReplicate());
Hregion.java
// code placeholder public Map<byte[], List<Path>> bulkLoadHFiles(Collection<Pair<byte[], String>> familyPaths, boolean assignSeqId, BulkLoadListener bulkLoadListener, boolean copyFile, List<String> clusterIds, boolean replicate)