Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-27494

Deduplicate the task result that generated by more branches in union all

    XMLWordPrintableJSON

Details

    Description

      HIVE-23891 adds the ability to deduplicate the task result that under the directory,

      <table-dir>/<staging-dir>/_tmp.-ext-10000/<dynamic-partition-dir>/HIVE_UNION_SUBDIR_1,

      but turns out to ignore taking the same action to the output directory for the same query:

      <table-dir>/<staging-dir>/_tmp.-ext-10000/<dynamic-partition-dir>/HIVE_UNION_SUBDIR_2.

      So user may still have the same data duplication problem upon multiple tez task attempts.

      Attachments

        1. explain.output
          17 kB
          Zhihua Deng
        2. ddl.q
          0.5 kB
          Zhihua Deng

        Issue Links

          Activity

            People

              dengzh Zhihua Deng
              dengzh Zhihua Deng
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: