Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1284

pig UDF is lacking XMLLoader. Plan to add the XMLLoader

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.7.0
    • 0.7.0
    • None
    • Initial version 1.0
    • XMLLoader

    Description

      Hi All,

      We are planning to add the XMLLoader UDF in the piggybank repository.

      Here is the proposal with the user docs :-

      The load function to load the XML file
      This will implements the LoadFunc interface which is used to parse records
      from a dataset.
      This takes a xmlTag as the arg which it will use to split the inputdataset into
      multiple records.

      For example if the input xml (input.xml) is like this
      <configuration>
      <property>
      <name> foobar </name>
      <value> barfoo </value>
      </property>
      <ignoreProperty>
      <name> foo </name>
      </ignoreProperty>
      <property>
      <name> justname </name>
      </property>
      </configuration>

      And your pig script is like this

      --load the jar files
      register loader.jar;
      – load the dataset using XMLLoader
      – A is the bag containing the tuple which contains one atom i.e doc see output
      A = load '/user/aloks/pig/input.xml using loader.XMLLoader('property') as (doc:chararray);
      --dump the result
      dump A;

      Then you will get the output

      (<property>
      <name> foobar </name>
      <value> barfoo </value>
      </property>)
      (<property>
      <name> justname </name>
      </property>)

      Where each () indicate one record

      Attachments

        1. pigudf_xmlLoader.patch
          24 kB
          Alok Singh
        2. pigudf_xmlLoader.patch
          24 kB
          Alok Singh

        Activity

          People

            aloknsingh Alok Singh
            aloknsingh Alok Singh
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 168h
                168h
                Remaining:
                Remaining Estimate - 168h
                168h
                Logged:
                Time Spent - Not Specified
                Not Specified