Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25561

Killed task should not commit file.

    XMLWordPrintableJSON

Details

    Description

      For tez engine in our cluster, I found some duplicate line, especially tez speculation is enabled. In partition dir, I found both 000002_0 and 000002_1 exist.
      It's a very low probability event. HIVE-10429 has fix some bug about interrupt, but some exception was not caught.

      In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was called, hdfs client will close. Then will raise exception, but abort may not set to true.
      Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate file will retain.
      (Notes: Driver first list dir, then Task commit file, then Driver remove duplicate file. It is a inconsistency case)

      Attachments

        Issue Links

          Activity

            People

              zhengchenyu Chenyu Zheng
              zhengchenyu Chenyu Zheng
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m