Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-10171

Create a storage-api module

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 2.1.0
    • None
    • None

    Description

      To support high performance file formats, I'd like to propose that we move the minimal set of classes that are required to integrate with Hive into a new module named "storage-api". This module will include VectorizedRowBatch, the various ColumnVector classes, and the SARG classes. It will form the start of an API that high performance storage formats can use to integrate with Hive. Both ORC and Parquet can use the new API to support vectorization and SARGs without performance destroying shims.

      Attachments

        Issue Links

          1.
          Remove hardcoded Parquet references from SearchArgumentImpl Sub-task Closed Owen O'Malley
          2.
          Remove the dependence from ErrorMsg to HiveUtils Sub-task Closed Owen O'Malley
          3.
          Remove use of PerfLogger from Orc Sub-task Closed Owen O'Malley
          4.
          Remove dependencies on NumericHistogram and NumDistinctValueEstimator from JavaDataModel Sub-task Closed Owen O'Malley
          5.
          Simplify the test for vectorized input Sub-task Resolved Owen O'Malley
          6.
          Remove dependence on VectorizedBatchUtil from VectorizedOrcAcidRowReader Sub-task Resolved Owen O'Malley
          7.
          Refactor the SearchArgumentFactory to remove the dependence on ExprNodeGenericFuncDesc Sub-task Closed Owen O'Malley
          8.
          Modify VectorizedRowBatch.toString() to not depend on VectorExpressionWriter Sub-task Closed Owen O'Malley
          9.
          Remove use of ErrorMsg in Orc's RunLengthIntegerReaderV2 Sub-task Closed Owen O'Malley
          10.
          Remove dependence from ORC's WriterImpl to OrcInputFormat Sub-task Resolved Owen O'Malley
          11.
          Move OrcRecordUpdater.getAcidEventFields to RecordReaderFactory Sub-task Closed Owen O'Malley
          12.
          In DateWritable remove the use of LazyBinaryUtils Sub-task Closed Owen O'Malley
          13.
          Remove dependency on HiveConf from Orc reader & writer Sub-task Closed Owen O'Malley
          14.
          Clean up dependencies in HiveDecimalWritable Sub-task Closed Owen O'Malley
          15.
          Remove getWritableObject from ColumnVectorBatch Sub-task Closed Owen O'Malley
          16.
          Move OrcFile.OrcTableProperties from OrcFile into OrcConf. Sub-task Closed Owen O'Malley
          17.
          Move ORC table properties from OrcFile to OrcOutputFormat Sub-task Closed Owen O'Malley
          18.
          Move SearchArgument and VectorizedRowBatch classes to storage-api. Sub-task Closed Owen O'Malley
          19.
          Create vectorized types for complex types Sub-task Closed Owen O'Malley
          20.
          Create vectorized write method Sub-task Closed Owen O'Malley
          21.
          Create ORC module Sub-task Closed Owen O'Malley
          22.
          Create row-by-row shims for the write path Sub-task Closed Owen O'Malley
          23.
          Create vectorized readers for the complex types Sub-task Closed Owen O'Malley
          24.
          Create shims for the row by row read path that is backed by VectorizedRowBatch Sub-task Closed Owen O'Malley
          25.
          Remove row by row reader. Sub-task Resolved Owen O'Malley
          26.
          Push TypeDescription in to the ReaderImpl and RecordReaderImpl Sub-task Resolved Unassigned

          Activity

            People

              omalley Owen O'Malley
              omalley Owen O'Malley
              Votes:
              1 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: