Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-593

Allow row-level Skipping

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • None
    • None
    • Patch, Important

    Description

      Currently, ORC supports filtering at: File, Stripe, and row group level.

      There is an on-going effort to add more detailed row-level filters using filter Predicates as part of the Reader.Options as part of ORC-577.

      However, there are still cases where the framework implementing the TreeReader interface wants to skip particular rows without using Predicates (e.g., simply using indexes for rows to be skipped), to avoid expensive type Decode i.e DecimalColumnVector or Decimal64ColumnVector type.

      In this ticket I propose to support extend the TreeReader abstract class with an extra method next Vector method.

      abstract void nextVector(ColumnVector previous,
       boolean[] isNull, boolean[] skipRows,
       final int batchSize)

      The subclasses implementing this method will be able to use the (existing) skipRows method to avoid expensive decoding when needed given the skipRows array argument.

      Attachments

        Issue Links

          Activity

            People

              pgaref Panagiotis Garefalakis
              pgaref Panagiotis Garefalakis
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h