Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12783

Nested struct with varlen data crashes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • Backend
    • None

    Description

      If a struct ("main") is within an array and contains two child structs ("s1" ans "s2") which both contain strings (or other varlen data), it crashes when re-materialised (for example in a sort with limit) if codegen is enabled.

      To reproduce:

      In Hive:

      create table nested (arr ARRAY<STRUCT<s1: STRUCT<str1: STRING>, s2: STRUCT<str2: STRING>>>) stored as parquet;
      insert into nested values (array( named_struct("s1", named_struct("str1", "A string that is long"), "s2", named_struct("str2", "Another string that is long") )));

      In Impala:

      select 1, arr from nested order by 1 limit 1;

      This seems to be because in the codegen'd code, when checking if the strings ("str1" and "str2" in the example) are NULL, we incorrectly calculate the offset of the null indicator byte from the memory adress of their containing struct, not from the beginning of the "master tuple", which in this case is the item tuple of the array.

      Note that the null indicators of the struct members are at the end of the tuple containing the struct (recursively), i.e. the master tuple.

      Attachments

        Issue Links

          Activity

            People

              daniel.becker Daniel Becker
              daniel.becker Daniel Becker
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: