Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11719

Inconsistency in printing NULL values

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 4.2.0
    • Backend
    • None

    Description

      If they are top level or in collections, null values are printed as "NULL":

      select int_array from functional_parquet.complextypestbl;
      +------------------------+
      | int_array              |
      +------------------------+
      | [-1]                   |
      | [1,2,3]                |
      | [NULL,1,2,NULL,3,NULL] |
      | []                     |
      | NULL                   |
      | NULL                   |
      | NULL                   |
      | NULL                   |
      +------------------------+

      If they are in a struct, they are printed as "null":

      select small_struct from functional_parquet.complextypes_structs;
      +------------------------------------+
      | small_struct                       |
      +------------------------------------+
      | NULL                               |
      | {"i":19191,"s":"small_struct_str"} |
      | {"i":98765,"s":null}               |
      | {"i":null,"s":"str"}               |
      | {"i":98765,"s":"abcde f"}          |
      | {"i":null,"s":null}                |
      +------------------------------------+

      In Hive the situation is a bit different: "NULL" is used only for top level values and "null" is printed in both collections and structs.

      select int_array from functional_parquet.complextypestbl;
      +-------------------------+
      |        int_array        |
      +-------------------------+
      | [-1]                    |
      | [1,2,3]                 |
      | [null,1,2,null,3,null]  |
      | []                      |
      | NULL                    |
      | NULL                    |
      | NULL                    |
      | NULL                    |
      +-------------------------+
      select small_struct from functional_parquet.complextypes_structs;
      +-------------------------------------+
      |            small_struct             |
      +-------------------------------------+
      | NULL                                |
      | {"i":19191,"s":"small_struct_str"}  |
      | {"i":98765,"s":null}                |
      | {"i":null,"s":"str"}                |
      | {"i":98765,"s":"abcde f"}           |
      | {"i":null,"s":null}                 |
      +-------------------------------------+

      Officially we print collections and structs in JSON form. In JSON the relevant keyword is "null".

      We should decide how we handle this situation.

      1. Have a uniform NULL representation everywhere: top level, collections and structs
        • either "NULL" or "null" everywhere
      2. Have "NULL" on the top level and "null" in collections and structs, like Hive
      3. Leave everything as it is now: "NULL" at the top level and in collections, "null" in structs.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            daniel.becker Daniel Becker
            daniel.becker Daniel Becker
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment