Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25243

Llap external client - Handle nested values when the parent struct is null

    XMLWordPrintableJSON

Details

    Description

      Consider the following table in text format -

      +-------------------------------+
      |              c8               |
      +-------------------------------+
      | NULL                          |
      | {"r":null,"s":null,"t":null}  |
      | {"r":"a","s":9,"t":2.2}       |
      +-------------------------------+
      

      When we query above table via llap external client, it throws following exception -

      Caused by: java.lang.NullPointerException: src
              at io.netty.util.internal.ObjectUtil.checkNotNull(ObjectUtil.java:33)
              at io.netty.buffer.UnsafeByteBufUtil.setBytes(UnsafeByteBufUtil.java:537)
              at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:199)
              at io.netty.buffer.WrappedByteBuf.setBytes(WrappedByteBuf.java:486)
              at io.netty.buffer.UnsafeDirectLittleEndian.setBytes(UnsafeDirectLittleEndian.java:34)
              at io.netty.buffer.ArrowBuf.setBytes(ArrowBuf.java:933)
              at org.apache.arrow.vector.BaseVariableWidthVector.setBytes(BaseVariableWidthVector.java:1191)
              at org.apache.arrow.vector.BaseVariableWidthVector.setSafe(BaseVariableWidthVector.java:1026)
              at org.apache.hadoop.hive.ql.io.arrow.Serializer.lambda$static$15(Serializer.java:834)
              at org.apache.hadoop.hive.ql.io.arrow.Serializer.writeGeneric(Serializer.java:777)
              at org.apache.hadoop.hive.ql.io.arrow.Serializer.writePrimitive(Serializer.java:581)
              at org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:290)
              at org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:359)
              at org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:296)
              at org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:213)
              at org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:135)
      

      Created a test to repro it -

      /**
       * TestMiniLlapVectorArrowWithLlapIODisabled - turns off llap io while testing LLAP external client flow.
       * The aim of turning off LLAP IO is -
       * when we create table through this test, LLAP caches them and returns the same
       * when we do a read query, due to this we miss some code paths which may have been hit otherwise.
       */
      public class TestMiniLlapVectorArrowWithLlapIODisabled extends BaseJdbcWithMiniLlap {
      
        @BeforeClass
        public static void beforeTest() throws Exception {
          HiveConf conf = defaultConf();
          conf.setBoolVar(ConfVars.LLAP_OUTPUT_FORMAT_ARROW, true);
          conf.setBoolVar(ConfVars.HIVE_VECTORIZATION_FILESINK_ARROW_NATIVE_ENABLED, true);
          conf.set(ConfVars.LLAP_IO_ENABLED.varname, "false");
          BaseJdbcWithMiniLlap.beforeTest(conf);
        }
      
        @Override
        protected InputFormat<NullWritable, Row> getInputFormat() {
          //For unit testing, no harm in hard-coding allocator ceiling to LONG.MAX_VALUE
          return new LlapArrowRowInputFormat(Long.MAX_VALUE);
        }
      
        @Test
        public void testNullsInStructFields() throws Exception {
          createDataTypesTable("datatypes");
          RowCollector2 rowCollector = new RowCollector2();
          // c8 struct<r:string,s:int,t:double>
          String query = "select c8 from datatypes";
          int rowCount = processQuery(query, 1, rowCollector);
          assertEquals(3, rowCount);
        }
      }
      

      Cause - As we see in the table above, first row of the table is NULL, and correspondingly we get structVector.isNull[i]=true in arrow serializer but we don't get isNull[i]=true for the fields of struct. And later the code goes for setting such fields in arrow vector and we see above exception.

      Attachments

        Issue Links

          Activity

            People

              ShubhamChaurasia Shubham Chaurasia
              ShubhamChaurasia Shubham Chaurasia
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h