Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22583

LLAP cache always misses with non-vectorized serde readers such as OpenCSV

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.0.0-alpha-1
    • llap
    • None

    Description

      Although after the first read LLAP cache stores data of tables that are not using the LazySimple serde, the stored data is then never used in the future subsequent queries, causing a full cache miss and re-read each time.

      Problem is rooted in SerdeEncodedDataReader#cacheFileData is not taking care of creating an entry for the root/struct column of the table. The only cases this is taken care of are when a vectorized reader is used (e.g. LazySimpleSerde's LazySimpleDeserializeRead), where SerdeEncodedDataReader#processAsyncCacheData takes care of this.

      This can be reproduced by either using a custom serde, like OpenCSV or using LazySimpleSerde, but turning off hive.llap.io.encode.vector.serde.enabled.

      Attachments

        1. HIVE-22583.3.patch
          5 kB
          Ádám Szita
        2. HIVE-22583.2.patch
          7 kB
          Ádám Szita
        3. HIVE-22583.1.patch
          7 kB
          Ádám Szita
        4. HIVE-22583.0.patch
          6 kB
          Ádám Szita

        Activity

          People

            szita Ádám Szita
            szita Ádám Szita
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: