Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4710

Document Drill's JSON processing rules

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • Future
    • Documentation
    • None

    Description

      One of Drill's key benefits is the ability to query JSON-formatted data. Much great work has been done. But, unless someone happens to be a Drill developer, the details of exactly how Drill handles various JSON formats can be hard to find.

      We should document how Drill handles various JSON scenarios.

      • SELECT * (schema inferred)
      • SELECT a, b, c (schema implied by query)

      And various JSON structures:

      • Top-level structure (list of maps. Can we handle an array of maps? A list of scalars?)
      • Changes of the top-level map structure across rows.
        • New field appears later in the file. (Was {a: 1, b: "s"}

          , now is

          {a: 1, b: "s", c: 10}
        • Fields disappear later in the file
        • Fields change type
        • Start of file has many nulls for a field, later in file has non-null values.
      • How Drill handles array fields
        • Array field is null: { a: [10, 20]}

          ,

          { a: null }
        • Array contains nulls: { a: [10, null, 20] }
        • Array contains single scalar type (number or string)
        • Array contains multiple scalar types (number and string)
        • Aray contains structured types (array, map)
      • How Drill handles nested maps
        • Explicit select: a, b.c, b.d: {a: 1, b:
          Unknown macro: { c}

          }

        • Implicit select: *
        • How data is delivered to Drill client
        • How data is delivered to JDBC/ODBC clients
      • Size issues
        • Very large records (what is max size?)
        • Very large strings
        • Vary large arrays

      Naming

      • Support for case-sensitive names: { a: 1, A: "foo" }

      The above is legal JSON, but causes problems with the case-insensitive naming rules of Drill

      Along with any other detailed information not covered by the above list.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              paul-rogers Paul Rogers
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: