XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Impala 4.3.0
    • None
    • Infrastructure
    • None
    • ghx-label-12

    Description

      Python 3 makes a clear distinction between bytes and strings (Unicode). To handle this appropriately, various places need to be clear about whether they are working on Unicode strings or bytes.

      The typical way to fix this for text is to implement a "Unicode sandwich" where the input path is converted to Unicode as early as possible and the output path is converted to bytes as late as possible. This leaves all internal code working on Unicode strings.

      Some parts of our code deal with bytes directly (e.g. tests/util/get_parquet_metadata.py has code that deals with the bytes of a Parquet file). Almost everything else should be dealing with Unicode strings.

      This is also a good time to fix warnings about the unicode() builtin and basestring.

      Attachments

        Activity

          People

            Unassigned Unassigned
            joemcdonnell Joe McDonnell
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: