Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-5400

OrcStorage dropping struct(tuple) when it only holds a single field inside a Bag(array)

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 0.18.0
    • impl
    • None
    • Reviewed

    Description

      I was asked by a user that they were seeing inconsistent schema when stored on OrcStorage. Sample code

       
      A = load 'input.txt' as (a0:long); 
      B = GROUP A by a0; 
      STORE B into 'filename' using OrcStorage(); 
      

      Pig's schema B: {group: long,A: bag: { tuple(a0: long)}}.

      Expected Orc schema struct<group:bigint,A:array<struct<bigint>>>
      Actual Orc schema struct<group:bigint,A:array<bigint>>

      This only happens when a tuple contains a single field.

      Current schema without struct(tuple) is better in saving space but it would be nice to have an option to keep the extra struct(tuple) layer if user expects schema evolution within that tuple by adding more fields in the future.

      Attachments

        1. pig-5400-v02.patch
          14 kB
          Koji Noguchi
        2. pig-5400-v01.patch
          14 kB
          Koji Noguchi

        Activity

          People

            knoguchi Koji Noguchi
            knoguchi Koji Noguchi
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: