Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14269 Performance optimizations for data on S3
  3. HIVE-16295

Add support for using Hadoop's S3A OutputCommitter

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Hive doesn't have integration with Hadoop's OutputCommitter, it uses a NullOutputCommitter and uses its own commit logic spread across FileSinkOperator, MoveTask, and Hive.

      The Hadoop community is building an OutputCommitter that integrates with S3Guard and does a safe, coordinate commit of data on S3 inside individual tasks (HADOOP-13786). If Hive can integrate with this new OutputCommitter there would be a lot of benefits to Hive-on-S3:

      • Data is only written once; directly committing data at a task level means no renames are necessary
      • The commit is done safely, in a coordinated manner; duplicate tasks (from task retries or speculative execution) should not step on each other

      Attachments

        1. HIVE-16295.1.WIP.patch
          44 kB
          Sahil Takiar
        2. HIVE-16295.2.WIP.patch
          44 kB
          Sahil Takiar
        3. HIVE-16295.3.WIP.patch
          45 kB
          Sahil Takiar
        4. HIVE-16295.4.patch
          110 kB
          Sahil Takiar
        5. HIVE-16295.5.patch
          118 kB
          Sahil Takiar
        6. HIVE-16295.6.patch
          124 kB
          Sahil Takiar
        7. HIVE-16295.7.patch
          124 kB
          Sahil Takiar
        8. HIVE-16295.8.patch
          133 kB
          Sahil Takiar
        9. HIVE-16295.9.patch
          134 kB
          Sahil Takiar

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            stakiar Sahil Takiar

            Dates

              Created:
              Updated:

              Slack

                Issue deployment