Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12905

Implement disk-based tuple caching

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 4.4.0
    • Impala 4.4.0
    • Backend
    • None

    Description

      The TupleCacheNode caches tuples to be reused later for equivalent queries. This tracks implementing a version that serializes tuples and stores them as files on local disk. 

      This will have a few parts:

      1. There is a TupleCacheMgr that keeps track of what entries exist in the cache and evicts entries as needed to make space for new entries. This will be configured using startup flags to specify the directory, size, and cache eviction policy.
      2. The TupleCacheNode will interact with the TupleCacheMgr to determine if the entry is available. If it is, it reads the associated tuple cache file and returns the RowBatches. If the entry does not exist, it reads RowBatches from its child and stores them to a new file in the cache.
      3. The TupleReader / TupleWriter implement serialization / deserialization of RowBatches to/from a local file. This uses the existing serialization used for KRPC.

      Attachments

        Activity

          People

            joemcdonnell Joe McDonnell
            joemcdonnell Joe McDonnell
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: