Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25443

Arrow SerDe Cannot serialize/deserialize complex data types When there are more than 1024 values

    XMLWordPrintableJSON

Details

    Description

      Complex data types like MAP, STRUCT cannot be serialized/deserialzed using Arrow SerDe when there are more than 1024 values. This happens due to ColumnVector always being initialized with a size of 1024.

      Issue #1 : https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L213

      Issue #2 : https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L215

      Sample unit test to reproduce the case in TestArrowColumnarBatchSerDe :

      @Test
         public void testListBooleanWithMoreThan1024Values() throws SerDeException {
           String[][] schema = {
                   {"boolean_list", "array<boolean>"},
           };
        
           Object[][] rows = new Object[1025][1];
           for (int i = 0; i < 1025; i++) {
             rows[i][0] = new BooleanWritable(true);
           }
        
           initAndSerializeAndDeserialize(schema, toList(rows));
         }
        
      

      Attachments

        Issue Links

          Activity

            People

              srahman Syed Shameerur Rahman
              srahman Syed Shameerur Rahman
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m