Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21328

Call To Hadoop Text getBytes() Without Call to getLength()

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 3.2.0, 4.0.0
    • None
    • Query Planning
    • None

    Description

      I'm not sure if there is actually a bug, but this looks highly suspect:

        public Object set(final Object o, final Text text) {
          return new BytesWritable(text == null ? null : text.getBytes());
        }
      

      https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetStringInspector.java#L104-L106

      There are two components to a Text object. There are the internal bytes and the length of the bytes. The two are independent. I.e., a quick "reset" on the Text object simply sets the internal length counter to zero. This code is potentially looking at obsolete data that it shouldn't be seeing because it is not considering the length of the Text.

      Attachments

        1. HIVE-21328.1.patch
          2 kB
          David Mollitor
        2. HIVE-21328.1.patch
          2 kB
          David Mollitor

        Activity

          People

            belugabehr David Mollitor
            belugabehr David Mollitor
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: