Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-26639

ConstantVectorExpression and ExplainTask shouldn't rely on default charset

    XMLWordPrintableJSON

Details

    Description

      In HS2 (and other components) we rely on UTF8 encoding, hence while storing strings as bytes, we store the UTF8-encoded bytes. Some java APIs rely on default system encoding in different ways, which can lead to incorrect encoding (if system settings defaults other than UTF8). This patch intends to fix 2 different paths:

      1. ConstantVectorExpression
      in my case, this:

      LOG.info("default charset name: " + java.nio.charset.Charset.defaultCharset().name());
      LOG.info("getBytes() = " + ((String) constantValue).getBytes());
      LOG.info("getBytes(StandardCharsets.UTF_8) = " + ((String) constantValue).getBytes(StandardCharsets.UTF_8));
      

      led to:

      default charset name: US-ASCII
      getBytes() = [B@73dcffb0
      getBytes(StandardCharsets.UTF_8) = [B@2ead0b9c
      

      on the customer side, queries returned wrong results when the filter contained the special character (which is part of UTF8 character table):

      SELECT b FROM default.rlv_test1 where b='北京';
      ....
      ??
      

      2. Explain
      Similarly, explain printed to a PrintStream of different encoding, leading to a plan like:

      	            Map Operator Tree:
      	                TableScan
      	                  alias: test_table
      	                  filterExpr: (b = '??') (type: boolean)
      	                  Statistics: Num rows: 2 Data size: 352 Basic stats: COMPLETE Column stats: COMPLETE
      	                  Filter Operator
      	                    predicate: (b = '??') (type: boolean)
      	                    Statistics: Num rows: 2 Data size: 352 Basic stats: COMPLETE Column stats: COMPLETE
      	                    Select Operator
      	                      expressions: a (type: int), '??' (type: string), c (type: string)
      

      Attachments

        Issue Links

          Activity

            People

              abstractdog László Bodor
              abstractdog László Bodor
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m