Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-19091

Add support for Tez to MagicS3GuardCommitter

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.3.6
    • None
    • fs/s3
    • None
    • Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0

    Description

      The MagicS3GuardCommitter assumes that the JobID of the task is the same as that of the job's application master when writing/reading the .pendingset file. This assumption is not valid when running with Tez, which creates slightly different JobIDs for tasks and the application master.

       

      While the MagicS3GuardCommitter is intended only for MRv2, it mostly works fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run in MR mode. This issue only crops up when running queries with the Tez execution engine. I can upload a patch to Hive 3.1 to reproduce this error on EMR if needed.

       

      Fixing this will probably require work from both Tez and Hadoop, wanted to start a discussion here so we can figure out how exactly we go about this.

      Attachments

        1. 0001-AWS-Hive-Changes.patch
          13 kB
          Venkatasubrahmanian Narayanan
        2. 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch
          17 kB
          Venkatasubrahmanian Narayanan
        3. HADOOP-19091-HIVE-WIP.patch
          70 kB
          Venkatasubrahmanian Narayanan

        Issue Links

          Activity

            People

              vnarayanan7 Venkatasubrahmanian Narayanan
              vnarayanan7 Venkatasubrahmanian Narayanan
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: