Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
ghx-label-10
Description
If a struct ("main") is within an array and contains two child structs ("s1" ans "s2") which both contain strings (or other varlen data), it crashes when re-materialised (for example in a sort with limit) if codegen is enabled.
To reproduce:
In Hive:
create table nested (arr ARRAY<STRUCT<s1: STRUCT<str1: STRING>, s2: STRUCT<str2: STRING>>>) stored as parquet; insert into nested values (array( named_struct("s1", named_struct("str1", "A string that is long"), "s2", named_struct("str2", "Another string that is long") )));
In Impala:
select 1, arr from nested order by 1 limit 1;
This seems to be because in the codegen'd code, when checking if the strings ("str1" and "str2" in the example) are NULL, we incorrectly calculate the offset of the null indicator byte from the memory adress of their containing struct, not from the beginning of the "master tuple", which in this case is the item tuple of the array.
Note that the null indicators of the struct members are at the end of the tuple containing the struct (recursively), i.e. the master tuple.
Attachments
Issue Links
- is related to
-
IMPALA-12781 ARRAY<STRUCT<s: STRING> crashes in top-n
- Resolved