Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-26958

JsonSerDe data corruption when scalar type is a json object

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • File Formats
    • None

    Description

       

      JsonSerDe uses the Jackson JsonParser.getText for decoding scalar values from json strings.  The problem is this method in Jackson converts any token to text including START_OBJECT '{}{'.  This means when a scalar field is actually a json object, JsonSerDe will process the open curly bracket for BOOLEAN, DECIMAL, CHAR, VARCHAR, and VARBINARY. Then it continues processing field inside of the json object as if they are part of the outer json object. When the closing curly bracket is encountered it pops a level, which can end parsing early. This bug will result in corrupted data for the following JSON:

       

      { "boolean_field" : {}, "other_field" : 99 } 
        => [boolean_field=false, other_field=null]
      
      
      { "boolean_field" : { "other_field" : 42 }, "other_field" : 99 } => (false, null) 
       => [boolean_field=false, other_field=42]

       

      BTW, when a json array is passed instead of an object, you get an error because the array does not contain fields which the code checks for.

      I think the behavior should result in an error like you get when a json array is field value for a scalar.  If so the fix is to make sure the value token a scalar for non-complex types in extractCurrentField, so something like this:

      if (!hcatFieldSchema.isComplex() && !valueToken.isScalarValue()) {
          throw new IOException(type + " value must be a scalar json value");
      } 

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            dain Dain Sundstrom
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: