Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-10885

with vectorization enabled join operation involving interval_day_time fails

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.2.0
    • 1.2.1
    • None
    • None

    Description

      When vectorization is on, join operation involving interval_day_time type throws following error:

      Status: Failed
      Vertex failed, vertexName=Map 2, vertexId=vertex_1432858236614_0247_1_01, diagnostics=[Task failed, taskId=task_1432858236614_0247_1_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
      	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
      	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.RuntimeException: Map operator initialization failed
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:229)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
      	... 14 more
      Caused by: java.lang.RuntimeException: Cannot allocate vector copy row for interval_day_time
      	at org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow.init(VectorCopyRow.java:213)
      	at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:581)
      	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
      	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
      	at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
      	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:214)
      	... 15 more
      ], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
      	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
      	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.RuntimeException: Map operator initialization failed
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:229)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
      	... 14 more
      Caused by: java.lang.RuntimeException: Cannot allocate vector copy row for interval_day_time
      	at org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow.init(VectorCopyRow.java:213)
      	at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:581)
      	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
      	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
      	at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
      	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:214)
      	... 15 more
      ], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
      	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
      	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.RuntimeException: Map operator initialization failed
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:229)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
      	... 14 more
      Caused by: java.lang.RuntimeException: Cannot allocate vector copy row for interval_day_time
      	at org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow.init(VectorCopyRow.java:213)
      	at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:581)
      	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
      	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
      	at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
      	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:214)
      	... 15 more
      ], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
      	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
      	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.RuntimeException: Map operator initialization failed
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:229)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
      	... 14 more
      Caused by: java.lang.RuntimeException: Cannot allocate vector copy row for interval_day_time
      	at org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow.init(VectorCopyRow.java:213)
      	at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:581)
      	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
      	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
      	at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
      	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:214)
      	... 15 more
      

      query ran:

      select
         v1.s,
         v2.s,
         v1.intrvl1 
      from
         ( select
            s,
            (cast(dt as date) - cast(ts as date)) as intrvl1 
         from
            vectortab10korc ) v1 
      join
         (
            select
               s ,
               (cast(dt as date) - cast(ts as date)) as intrvl2 
            from
               vectorparttab10korc 
         ) v2 
            on v1.intrvl1 = v2.intrvl2 
            and v1.s = v2.s;
      

      explain plan:

      OK
      STAGE DEPENDENCIES:
        Stage-1 is a root stage
        Stage-0 depends on stages: Stage-1
      
      STAGE PLANS:
        Stage: Stage-1
          Tez
            Edges:
              Map 2 <- Map 1 (BROADCAST_EDGE)
            DagName: hrt_qa_20150601024305_7745bc8f-169f-45c6-8856-7391eef0d819:3
            Vertices:
              Map 1 
                  Map Operator Tree:
                      TableScan
                        alias: vectortab10korc
                        filterExpr: s is not null (type: boolean)
                        Statistics: Num rows: 10000 Data size: 4597592 Basic stats: COMPLETE Column stats: PARTIAL
                        Filter Operator
                          predicate: s is not null (type: boolean)
                          Statistics: Num rows: 10000 Data size: 1340000 Basic stats: COMPLETE Column stats: PARTIAL
                          Select Operator
                            expressions: s (type: string), (dt - CAST( ts AS DATE)) (type: interval_day_time)
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 10000 Data size: 940000 Basic stats: COMPLETE Column stats: PARTIAL
                            Filter Operator
                              predicate: _col1 is not null (type: boolean)
                              Statistics: Num rows: 10000 Data size: 940000 Basic stats: COMPLETE Column stats: PARTIAL
                              Reduce Output Operator
                                key expressions: _col1 (type: interval_day_time), _col0 (type: string)
                                sort order: ++
                                Map-reduce partition columns: _col1 (type: interval_day_time), _col0 (type: string)
                                Statistics: Num rows: 10000 Data size: 940000 Basic stats: COMPLETE Column stats: PARTIAL
                              Select Operator
                                expressions: _col0 (type: string)
                                outputColumnNames: _col0
                                Statistics: Num rows: 10000 Data size: 940000 Basic stats: COMPLETE Column stats: PARTIAL
                                Group By Operator
                                  keys: _col0 (type: string)
                                  mode: hash
                                  outputColumnNames: _col0
                                  Statistics: Num rows: 5000 Data size: 470000 Basic stats: COMPLETE Column stats: PARTIAL
                                  Dynamic Partitioning Event Operator
                                    Target Input: vectorparttab10korc
                                    Partition key expr: s
                                    Statistics: Num rows: 5000 Data size: 470000 Basic stats: COMPLETE Column stats: PARTIAL
                                    Target column: s
                                    Target Vertex: Map 2
                  Execution mode: vectorized
              Map 2 
                  Map Operator Tree:
                      TableScan
                        alias: vectorparttab10korc
                        filterExpr: s is not null (type: boolean)
                        Statistics: Num rows: 10000 Data size: 3656191 Basic stats: COMPLETE Column stats: PARTIAL
                        Select Operator
                          expressions: s (type: string), (dt - CAST( ts AS DATE)) (type: interval_day_time)
                          outputColumnNames: _col0, _col1
                          Statistics: Num rows: 10000 Data size: 1840000 Basic stats: COMPLETE Column stats: PARTIAL
                          Filter Operator
                            predicate: _col1 is not null (type: boolean)
                            Statistics: Num rows: 10000 Data size: 1840000 Basic stats: COMPLETE Column stats: PARTIAL
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              keys:
                                0 _col1 (type: interval_day_time), _col0 (type: string)
                                1 _col1 (type: interval_day_time), _col0 (type: string)
                              outputColumnNames: _col0, _col1, _col2
                              input vertices:
                                0 Map 1
                              Statistics: Num rows: 344 Data size: 95632 Basic stats: COMPLETE Column stats: PARTIAL
                              HybridGraceHashJoin: true
                              Select Operator
                                expressions: _col0 (type: string), _col2 (type: string), _col1 (type: interval_day_time)
                                outputColumnNames: _col0, _col1, _col2
                                Statistics: Num rows: 344 Data size: 95632 Basic stats: COMPLETE Column stats: PARTIAL
                                File Output Operator
                                  compressed: false
                                  Statistics: Num rows: 344 Data size: 95632 Basic stats: COMPLETE Column stats: PARTIAL
                                  table:
                                      input format: org.apache.hadoop.mapred.TextInputFormat
                                      output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                  Execution mode: vectorized
      
        Stage: Stage-0
          Fetch Operator
            limit: -1
            Processor Tree:
              ListSink
      
      Time taken: 0.402 seconds, Fetched: 91 row(s)
      

      Attachments

        1. HIVE-10885.01.patch
          349 kB
          Matt McCline
        2. HIVE-10885.02.patch
          353 kB
          Matt McCline
        3. HIVE-10885.03.patch
          353 kB
          Matt McCline

        Activity

          People

            mmccline Matt McCline
            jvaria Jagruti Varia
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: