Details
-
Improvement
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
1.2.1
-
None
-
None
-
Issue detected using Beeline with HBase Phoenix thin driver and a result set with many columns.
Description
Beeline performance is rather poor for table output format when two conditions occur for the same result set.
- The result set has a large number of columns.
- The driver being used has a slow implementation of DatabaseMetaData.getPrimaryKeys.
For example testing has shown that for a query with ~100 columns using the HBase Phoenix thin driver the execution time can be cut from ~30 seconds to ~2 seconds by using CSV output format vs table output format. For example: select * from system.catalog;
This is due to how primary keys are detected. Currently the Rows implementation will make a metadata call for every column to determine it is a primary key for display purposes. I propose optimizing this such that a metadata call is only made for each unique table in the result set's columns.