Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20738 Enable Delete Event filtering in VectorizedOrcAcidRowBatchReader
  3. HIVE-20664

Potential ArrayIndexOutOfBoundsException in VectorizedOrcAcidRowBatchReader.findMinMaxKeys

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 4.0.0-alpha-1
    • Transactions
    • None
    • n/a

    Description

      ekoifman, could you please confirm if my understanding is correct and if so, review the fix?

      In the method VectorizedOrcAcidRowBatchReader.findMinMaxKeys, the code snippet that identifies the first and last stripe indices in the current split could result in an ArrayIndexOutOfBoundsException if a complete split is within the same stripe:

          for(int i = 0; i < stripes.size(); i++) {
            StripeInformation stripe = stripes.get(i);
            long stripeEnd = stripe.getOffset() + stripe.getLength();
            if(firstStripeIndex == -1 && stripe.getOffset() >= splitStart) {
              firstStripeIndex = i;
            }
            if(lastStripeIndex == -1 && splitEnd <= stripeEnd &&
                stripes.get(firstStripeIndex).getOffset() <= stripe.getOffset() ) {
              //the last condition is for when both splitStart and splitEnd are in
              // the same stripe
              lastStripeIndex = i;
            }
          }
      

      Consider the example where there are 2 stripes - 0-500 and 500-1000 and splitStart is 600 and splitEnd is 800.

      In the first iteration of the loop, stripe.getOffset() is 0 and stripeEnd is 500. In this iteration, neither of the if statement conditions will be met and firstSripeIndex as well as lastStripeIndex remain -1.

      In the second iteration of the loop stripe.getOffset() is 500, stripeEnd is 1000, The first if statement condition will not be met in this case because stripe's offset (500) is not greater than or equal to the splitStart (600). However, in the second if statement, splitEnd (800) is <= stripeEnd(1000) and it will try to compute the last condition stripes.get(firstStripeIndex).getOffset() <= stripe.getOffset(). This will throw an ArrayIndexOutOfBoundsException because firstStripeIndex is still -1.

      I'm not sure if this scenario is possible at all, hence logging this as a low priority issue. Perhaps block based split generation using BISplitStrategy could trigger this?

      Attachments

        1. HIVE-20664.patch
          2 kB
          Saurabh Seth
        2. HIVE-20664.2.patch
          2 kB
          Saurabh Seth

        Issue Links

          Activity

            People

              saurabhseth Saurabh Seth
              saurabhseth Saurabh Seth
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: