Details
-
Improvement
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
None
-
None
-
None
Description
I see VerifyReplication is too slow in Geo replication cluster, then I dig into the code where default Input scanner caching set as 1 for target cluster request.
This value should be optimal or could be exposed in usage command.
-Dhbase.mapreduce.scan.cachedrows=100
TableInputFormat.java
public static final String SCAN_CACHEDROWS = "hbase.mapreduce.scan.cachedrows";
VerifyReplication.java
Configuration conf = context.getConfiguration(); final Scan scan = new Scan(); scan.setCaching(conf.getInt(TableInputFormat.SCAN_CACHEDROWS, 1));
If agree, then I will add this line into printUsage method as shown below,
VerifyReplication.java
System.err.println("For performance consider the following option, Input scanner caching for source to target cluster request\n" + "-Dhbase.mapreduce.scan.cachedrows=100");