Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17998

[Java] Support for textual JSON schema representation (was: JSON representation of pojo.Schema is incompatible with flatbuffers JSON generated via C++ API)

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 6.0.1
    • None
    • Format, Java

    Description

      I have JSON arrow::Schema representation generated from flatbuffers format in C++:

       

      const void* schemaBytes;
      
      std::string fbsSchemaFile;    
      flatbuffers::LoadFile("/path/to/Schema.fbs", false, &fbsSchemaFile);
      
      flatbuffers::Parser parser;
      parser.Parse(fbsSchemaFile.c_str());
      
      std::string json;
      flatbuffers::GenerateTextFromTable(parser, schemaBytes, "org.apache.arrow.flatbuf.Schema", &json);
      
      return json;

       

      When I'm trying to read this JSON in Java and create pojo.Schema:

       

      String json; // Read from file.
      Schema.fromJSON(json);

       

       

      It fails because JSON formats in flatbuffers generation and in Java using Jackson bindings are a bit different:

       

      C++ Schema Flatbuffers JSON example:

      {
        fields: [
          {
            name: "cc_call_center_sk",
            type_type: "Int",
            type: {
              bitWidth: 32,
              is_signed: true
            },
            children: [
      
            ],
            custom_metadata: [
              {
                key: "metadata",
                value: "some_metadata"
              }
            ]
          },
        ],
        custom_metadata: [
          {
            key: "metadata",
            value: "some_metadata"
          }
        ]
      }

      Java Schema JSON example:

      {
        "fields" : [ {
          "name" : "cc_call_center_sk",
          "nullable" : true,
          "type" : {
            "name" : "int",
            "bitWidth" : 32,
            "isSigned" : true
          },
          "children" : [ ],
          "metadata" : [ {
            "value" : "some_metadata",
            "key" : "metadata"
          } ]
        } ],
        "metadata" : [ {
          "value" : "some_metadata",
          "key" : "metadata"
        } ]
      } 

      There is a difference in type id declaration:

      `type_type` field is used in C++ flatbuffers

      `name` field inside `type` field is used in Java

       

      Also, there is a difference in `metadata` field:

      `custom_metadata` name is used in C++ flatbuffers

      `metadata` name is used in Java

       

      It makes it impossible to re-use JSON representation from Java in C++ and vice-versa

      Probably the same issue exists in other languages

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jokser Pavel Kovalenko
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: