Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-18931

Race condition when ensuring a partition exists often causes AlreadyExistsException for the partition

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.2.1
    • None
    • Clients, HCatalog
    • None

    Description

      Hiya! I'm using Apache Beam's HCatalogIO to store data in Hive. As part of HCatOutputFormatWriter#commit(), partitions are registered in FileOutputCommitterContainer#registerPartitions(). Here, it checks for the existence of the partitions, and, if so needed, creates them. For parallel processes, this fails a lot of the time, because, as far as I understand, another process creates the partition in the meantime. This causes an AlreadyExistsException to be created for the partition, which bubbles up to the API consumer.

      Relevant logic: https://github.com/apache/hive/blob/release-1.2.1/hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileOutputCommitterContainer.java#L898-L920. Logic is also present on master.

      Am I missing something, and if not, is it an acceptable solution to swallow the AlreadyExistsException when adding the partitions?

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            rjkip Reinier Kip

            Dates

              Created:
              Updated:

              Slack

                Issue deployment