Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.12.0
-
None
-
None
Description
Described on the mailing list here: http://www.mail-archive.com/user%40pig.apache.org/msg09009.html
A Pig LoadFunc cannot get a hold of its associated schema. For example, in the following script:
A = LOAD 'pig/tupleartists' USING MyStorage() AS (name: chararray, links (url:chararray, picture:chararray));
B = FOREACH A GENERATE name, links.url;
DUMP B;
MyStorage cannot get a hold of (name:chararray, links ... even when LoadPushDown#pushProjection() is implemented (which is called only when a transformation occurs - PlanOptimizer/ColumnMapKeyPrune).
One can look into a POStore but even then the information obtain is incomplete - meaning the schema is incomplete and the fields mentioned in FOREACH are dereferenced links.url is returned as url.
The purpose of this issue is to allow a LoadFunc implementation to get access to its schema declaration as specified in the script.
Thanks!