Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4943

Schema issue while storing multiple pig outputs using CSVExcelStorage

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.14.0
    • None
    • piggybank
    • None

    Description

      I have a script which stores 2 relations with different schema using CSVExcelStorage.

      The issue which i see is that the script picks up the last store function and takes the schema in that and puts it for all store functions , overriding the previous store schemas.

      My Sample Script Looks like this :--

      =============================================================

      masterInput = load 'hbase://xyz' using org.apache.pig.backend.hadoop.hbase.HBaseStorage(
      'f:a,f:b,f:c,f:d')
      as (a,b,c,d);

      input2 = foreach masterInput
      generate
      a,b;

      input3 = foreach masterInput
      generate
      c,d;

      store input2 into '/dir/ab'
      using org.apache.pig.piggybank.storage.CSVExcelStorage('\t','YES_MULTILINE', 'UNIX', 'WRITE_OUTPUT_HEADER');

      store input3 into '/dir/cd'
      using org.apache.pig.piggybank.storage.CSVExcelStorage('\t','YES_MULTILINE', 'UNIX', 'WRITE_OUTPUT_HEADER');

      =============================================================
      Where a,b,c,d are my headers in my source file

      Expected Output :

      file 1
      a b
      10 20
      file 2
      c d
      30 40

      Actual Output :

      file 1
      c d
      10 20
      file 2
      c d
      30 40

      Attachments

        Activity

          People

            Unassigned Unassigned
            ssasidharan sarath
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: