Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Later
    • None
    • None
    • None
    • None

    Description

      Cache in the prototype has number of limitations.
      1) Having 16-32-..Mb chunks with many logical units of caching can result in undesirable priority phenomena. Priority tracking is needed for every such unit, with some form of priority-splitting "compaction".
      I have a design for that that never blocks readers...
      2) Something like buddy allocator can also be used instead of fixed size blocks.
      3) Needs tighter integration with file formats since we abandoned intermediate format and are planning to make unit of caching much smaller (RG, not stripe) - e.g. ORC can decompress data directly into a large buffer, then pass on logical boundaries to ChunkPool.
      4) For the same reason of having so many cached objects one might consider actually making it format-specific and/or hierarchical, since requestion 1000s of objects may be suboptimal (e.g. TPCDS stripe has ~430 RGs, with just a few columns that's a lot of objects to request - much easier if RGs are all sequential and can be returned together if sargs didn't do a lot of filtering).
      5) Minor like not reusing allocated buffers after they are evicted and instead allocating again, etc.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sershe Sergey Shelukhin
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: