Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 2.2.4, Impala 2.3.0
Description
Impala is unable to read Java based UDFs that return a standard java.lang primitive datatype like Float or String. As an example, the text mining UDF @ https://github.com/rueedlinger/hive-udf/ has a function 'Distance' that is used to calculate Levenshtein distance and returns a Float data type.
[xxx.yyy.zzz.com:21000] > CREATE FUNCTION levdistance(STRING,STRING,STRING) RETURNS FLOAT LOCATION 'hdfs://datalab-uatb/project/p11/inc/private/jars/hive-udf-textmining-1.0-SNAPSHOT-jar-with-dependencies.jar' SYMBOL='ch.yax.hive.udf.text.Distance';
Query: create FUNCTION levdistance(STRING,STRING,STRING) RETURNS FLOAT LOCATION 'hdfs://datalab-uatb/project/p11/inc/private/jars/hive-udf-textmining-1.0-SNAPSHOT-jar-with-dependencies.jar' SYMBOL='ch.yax.hive.udf.text.Distance'
Fetched 0 row(s) in 0.77s
[xxx.yyy.zzz.com:21000] > select levdistance('L','test','testing');
Query: select levdistance('L','test','testing')
---------------------------------------------
default.levdistance('l', 'test', 'testing') |
---------------------------------------------
NULL |
---------------------------------------------
WARNINGS: UDF WARNING: Hive UDF path=hdfs://datalab-uatb/project/p11/inc/private/jars/hive-udf-textmining-1.0-SNAPSHOT-jar-with-dependencies.jar class=ch.yax.hive.udf.text.Distance failed due to: ImpalaRuntimeException: UDF::evaluate() ran into a problem.
CAUSED BY: ClassCastException: java.lang.Float cannot be cast to org.apache.hadoop.io.FloatWritable
The only way to get this working is to re-create the UDF using the Hadoop Writables data type. This is obviously not ideal as the process of creating the UDF is time consuming and needs extra resources whereas it should have worked in the first place like the way it works in Hive.