Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
Description
I came across an issue when working on HIVE-22674.
The SerDe used for processing binary data tries to auto-detect if the data is in Base-64. It uses org.apache.commons.codec.binary.Base64#isArrayByteBase64 which has two issues:
- It's slow since it will check if the array is compatible,... and then process the data (examines the array twice)
- More importantly, this method Tests a given byte array to see if it contains only valid characters within the Base64 alphabet. Currently the method treats whitespace as valid.
The qtest for this feature uses full sentences (which includes spaces) here and therefore it thinks this data is Base-64 and returns an incorrect estimation for size.
This should really not auto-detect Base64 data and instead it should be enabled with a table property.
Attachments
Issue Links
- is related to
-
HIVE-22674 Replace Base64 in serde Package
- Closed
- links to