Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-51

Generate and accept JSON as the input-output format from mappers and reducers

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Query Processor
    • Labels:
      None

      Description

      set mapred.data.format=JSON;
      ....
      MAP USING 'python filter.py'
      ....;

      would mean that filter.py would receive a JSON formatted dictionary of the columns instead of a tab-delimited string.

      { column1: value1, column2: [1,2,3] }

      etc

      It would in turn produce JSON.

      This should be done so that the JSON is not transmitted back and forth over the network; it would be generated on the fly on the mapper node, and converted back to the standard format used (tab-delimited, I assume).

      This seems like the simplest way for encoding type information in the input to mappers; it would also remove the need for silly boilerplate code that took a list of expected input column names, took the input stream, split it up, and made a dictionary of

      {column name: value}

      on every record.

      Output schemas (USING '' AS ...) might also be redundant with this in place, but I'm not sure if that is doable.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                indigoviolet Venky Iyer
              • Votes:
                1 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated: