Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20254

CheckNonCombinablePathCallable is buggy

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.1.0
    • 2.3.0
    • None
    • None

    Description

      CombineHiveInputFormat provides the possibility for people to avoid combine some part of their inputs (by implementing AvoidSplitCombination)

      We spot a problem with that when our query tries to read a lot of partitions (more than 100). In fact, when there are more than 100 input paths, the check of combinability is run in parallel:

      • dividing the input path array into several chunks (each chunk with no more than 100 paths)
      • submit each chunk to a CheckNonCombinablePathCallable
      • each CheckNonCombinablePathCallable will return a set of index for the paths to not be combined

      The problem is that CheckNonCombinablePathCallable returns a set of relative index (the index inside the chunk) instead of the absolute index, it means that the returned indices are always smaller than 100, thus all the paths in the array with position bigger than 100 are never taken into account for avoiding combine input.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              q.xu Qinghui Xu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: