Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
-
None
-
Reviewed
Description
Bytes.toStringBinary is quite expensive due to its use of String.format. It seems to me that String.format is overkill for the purpose and I could actually make the function up to 45-times faster by replacing the part with a simpler hand-crafted code.
This is probably a non-issue for HBase server as the function is not used in performance-sensitive contexts but I figured it wouldn't hurt to make it faster as it's widely used in builtin tools - Shell, HFilePrettyPrinter with -p option, etc. - and it can be used in clients.
Background:
We have an HBase monitoring tool that periodically collects the information of the regions and it calls Bytes.toStringBinary during the process to make some information suitable for display. Profiling revealed that a large portion of the processing time was spent in String.format.
Micro-benchmark:
byte[] bytes = new byte[256]; for (int i = 0; i < bytes.length; ++i) { // Mixture of printable and non-printable characters. // Maximal performance gain (45x) is observed when the array is solely // composed of non-printable characters. bytes[i] = (byte) i; } long started = System.nanoTime(); for (int i = 0; i < 1000000; ++i) { Bytes.toStringBinary(bytes); } System.out.println(TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - started));
- Without the patch: 134176 ms
- With the patch: 3890 ms
I made sure that the new version returns the same value as before and simplified the check for non-printable characters.