Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20990

ORC case when/if with coalesce wrong results or case: java.lang.AssertionError: Output column number expected to be 0 when isRepeating

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.0.0, 3.1.3
    • None
    • Vectorization
      • Hive 3.0.0 from hdp 3.0.1.
      • centos 7
      • 5 datanodes, 2 masters

       

    Description

      Run this to replicate:

      drop table if exists ds;
      create table ds stored as orc as select
        inline(array(
          struct('gmail.com'), -- when branch of the case statement
          struct('apache.org') -- else branch of the case statement
        )) as (domain)
      ;
      
      select
        case
          when domain='gmail.com' then 'gmail'
          else coalesce(domain, 'other')
        end as domaingroup
      from ds
      group by 1 -- useless (datawise) for this example, but triggers the bug.
      ;
      

      Exceptions out with:  

      java.lang.RuntimeException: java.lang.AssertionError: Output column number expected to be 0 when isRepeating
      

       

      Full exception is shown below.

      Of interest:

      • if the case is removed (eg. replaced only by the else clause), the query works,
      • if only the else clause from the case matches, the query works,
      • replacing the case by a bunch of nested if does not change anything,
      • removing the group by and the query works,
      • replacing the table (ds) by a CTE and the query works.

      Workaround, at the cost of performance: 

      set hive.vectorized.execution.enabled = false

       

      Full exception:

      ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1543326624484_8258_45_00, diagnostics=[Task failed, taskId=task_1543326624484_8258_45_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : java.lang.RuntimeException: java.lang.AssertionError: Output column number expected to be 0 when isRepeating
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
      	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
      	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
      	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
      	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
      	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
      	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
      	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
      	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
      	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.AssertionError: Output column number expected to be 0 when isRepeating
      	at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:492)
      	at org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprColumnCondExpr.evaluate(IfExprColumnCondExpr.java:117)
      	at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
      	at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965)
      	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
      	at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
      	at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:812)
      	at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:845)
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
      	... 19 more
      , errorMessage=Cannot recover from this error:java.lang.RuntimeException: java.lang.AssertionError: Output column number expected to be 0 when isRepeating
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
      	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
      	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
      	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
      	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
      	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
      	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
      	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
      	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
      	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.AssertionError: Output column number expected to be 0 when isRepeating
      	at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:492)
      	at org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprColumnCondExpr.evaluate(IfExprColumnCondExpr.java:117)
      	at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
      	at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965)
      	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
      	at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
      	at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:812)
      	at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:845)
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
      	... 19 more
      ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1543326624484_8258_45_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1543326624484_8258_45_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:2, Vertex vertex_1543326624484_8258_45_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              guillaume@lomig.net Guillaume
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated: