Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20771

LazyBinarySerDe fails on empty structs.

    XMLWordPrintableJSON

Details

    Description

      CREATE TABLE cvaliente.structtest AS
      SELECT named_struct();
      
      SHOW CREATE TABLE cvaliente.structtest;
      
      SELECT * FROM cvaliente.structtest ORDER BY rand();
      

      The resulting schema is:

      CREATE TABLE `cvaliente.structtest`(
        `_c0` struct<>)
      ROW FORMAT SERDE 
        'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
      STORED AS INPUTFORMAT 
        'org.apache.hadoop.mapred.TextInputFormat' 
      OUTPUTFORMAT 
        'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
      LOCATION
        'hdfs://nameservice1/user/cvaliente/cvaliente/structtest2'
      TBLPROPERTIES (
        'COLUMN_STATS_ACCURATE'='true', 
        'numFiles'='1', 	  
        'numRows'='1', 
        'rawDataSize'='0', 
        'totalSize'='1', 	  
        'transient_lastDdlTime'='1539781607');
      

      Between the MAP and REDUCE phase hive serializes to LazyBinaryStruct and when trying to read the same object back the SELECT query above fails:

      2018-10-17 14:32:02,298 [FATAL] [TezChild] |tez.ReduceRecordSource|: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":0.13508293503238622},"value":{"_col0":{}}}
      	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:338)
      	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:259)
      	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:169)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:164)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
      	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
      	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
      	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
      	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
      	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
      	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating VALUE._col0
      	at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:82)
      	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:329)
      	... 17 more
      Caused by: java.lang.RuntimeException: length should be positive!
      	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryNonPrimitive.init(LazyBinaryNonPrimitive.java:54)
      	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.init(LazyBinaryStruct.java:95)
      	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264)
      	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201)
      	at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
      	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:98)
      	at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
      	at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
      	at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77)
      	... 18 more
      

      this is because the LazyBinaryNonPrimitive doesn't allow for empty structs in https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryNonPrimitive.java#L53

      Attachments

        1. HIVE-20771.patch
          3 kB
          Clemens Valiente

        Activity

          People

            cvaliente Clemens Valiente
            cvaliente Clemens Valiente
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 50m
                1h 50m