Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7493

convert_fromJSON and unicode

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.16.0
    • None
    • Functions - Drill
    • None

    Description

      transform a json string (with \uxxxx char) into json struct

      apache drill> SELECT x_str, convert_fromJSON(x_str) AS x_array 
      FROM (SELECT '["test=\u0014=test"]' x_str);
      +----------------------+----------------------+
      |        x_str         |       x_array        |
      +----------------------+----------------------+
      | ["test=\u0014=test"] | ["test=\u0014=test"] |
      +----------------------+----------------------+
      

      Use json struct :

      apache drill> SELECT x_str
      , x_array
      , x_array[0] AS x_array0 
      FROM(SELECT x_str, convert_fromJSON(x_str) AS x_array
      FROM (SELECT '["test=\u0014=test"]' x_str));
      +----------------------+----------------------+-------------+
      |        x_str         |       x_array        |  x_array0   |
      +----------------------+----------------------+-------------+
      | ["test=\u0014=test"] | ["test=\u0014=test"] | test==test |
      +----------------------+----------------------+-------------+
      

      Note that the char \u0014 is interpreted in x_array0

      if using split function on x_array0, an array is built with non interpreted \uxxxx

      apache drill> SELECT x_str
      , x_array
      , x_array[0] AS x_array0
      , split(x_array[0],',') AS x_array0_split 
      FROM(SELECT x_str, convert_fromJSON(x_str) AS x_array 
      FROM (SELECT '["test=\u0014=test"]' x_str));
      +----------------------+----------------------+-------------+----------------------+
      |        x_str         |       x_array        |  x_array0   |    x_array0_split    |
      +----------------------+----------------------+-------------+----------------------+
      | ["test=\u0014=test"] | ["test=\u0014=test"] | test==test | ["test=\u0014=test"] |
      +----------------------+----------------------+-------------+----------------------+
      

      It's not possible to use convert_fromJSON on the interpreted \uxxxx

      SELECT x_str
      , x_array
      , x_array[0] AS x_array0
      , split(x_array[0],',') AS x_array0_split
      , convert_fromJSON('["' || x_array[0] || '"]') AS convertJSONerror 
      FROM(SELECT x_str, convert_fromJSON(x_str) AS x_array 
      FROM (SELECT '["test=\u0014=test"]' x_str));
      Error: DATA_READ ERROR: Illegal unquoted character ((CTRL-CHAR, code 20)): has to be escaped using backslash to be included in string value
       at [Source: (org.apache.drill.exec.vector.complex.fn.DrillBufInputStream); line: 1, column: 9]
      

      don't work although the string is the same as the origin but \uxxxx is unfortunatly interpreted

      Attachments

        Activity

          People

            Unassigned Unassigned
            benj641 benj
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: