Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4901

To use Multistorage for each Group

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.11.1, 0.16.0
    • 0.17.0
    • piggybank
    • None
    • Hadoop 1.2.0

    • Reviewed

    Description

      I am trying to group my data and store in hdfs with a folder for each 'name' and subfolders for each 'YearMonth' under each name folder.

      Input:
      (Date) (name) (col3) (col4)
      2015-02-02 abc y z
      2016-01-02 xyz i j
      2015-03-02 abc f b
      2015-02-06 abc y z
      2016-03-02 xyz a q

      Expected out in hdfs:
      abc folder
      ->201502 subfolder
      2015-02-02 abc y z
      2015-02-06 abc y z
      ->201503 subfolder
      2015-03-02 abc f b
      xyz folder
      ->201601
      2016-01-02 xyz i j
      ->201603
      2016-03-02 xyz a q

      I am not sure of how to use the Multistorage option on Name column after grouping the tuples by date.

      Any help is appreciated.

      Attachments

        1. PIG-4901.2.patch
          16 kB
          Ádám Szita
        2. PIG-4901.patch
          16 kB
          Ádám Szita

        Activity

          People

            szita Ádám Szita
            dreddy Divya
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: