Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3646

LoadFunc cannot get a hold of the associated user defined schema

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.12.0
    • None
    • data
    • None

    Description

      Described on the mailing list here: http://www.mail-archive.com/user%40pig.apache.org/msg09009.html

      A Pig LoadFunc cannot get a hold of its associated schema. For example, in the following script:

      A = LOAD 'pig/tupleartists' USING MyStorage() AS (name: chararray, links (url:chararray, picture:chararray));
      B = FOREACH A GENERATE name, links.url;
      DUMP B;
      

      MyStorage cannot get a hold of (name:chararray, links ... even when LoadPushDown#pushProjection() is implemented (which is called only when a transformation occurs - PlanOptimizer/ColumnMapKeyPrune).

      One can look into a POStore but even then the information obtain is incomplete - meaning the schema is incomplete and the fields mentioned in FOREACH are dereferenced links.url is returned as url.

      The purpose of this issue is to allow a LoadFunc implementation to get access to its schema declaration as specified in the script.

      Thanks!

      Attachments

        Activity

          People

            Unassigned Unassigned
            costin.leau Costin Leau
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: