Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-13756

Map failure attempts to delete reducer _temporary directory on multi-query pig query

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.2.1, 2.0.0
    • 2.2.0
    • HCatalog
    • None

    Description

      A pig script, executed with multi-query enabled, that reads the source data and writes it as-is into TABLE_A as well as performing a group-by operation on the data which is written into TABLE_B can produce erroneous results if any map fails. This results in a single MR job that writes the map output to a scratch directory relative to TABLE_A and the reducer output to a scratch directory relative to TABLE_B.

      If one or more maps fail it will delete the attempt data relative to TABLE_A, but it also deletes the _temporary directory relative to TABLE_B. This has the unintended side-effect of preventing subsequent maps from committing their data. This means that any maps which successfully completed before the first map failure will have its data committed as expected, other maps not, resulting in an incomplete result set.

      Attachments

        1. HIVE-13756-branch-1.patch
          2 kB
          Chris Drome
        2. HIVE-13756.patch
          2 kB
          Chris Drome
        3. HIVE-13756.1-branch-1.patch
          2 kB
          Chris Drome
        4. HIVE-13756.1.patch
          2 kB
          Chris Drome

        Activity

          People

            cdrome Chris Drome
            cdrome Chris Drome
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: