Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1669

SQLContext.cacheTable() should be idempotent

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0.0
    • 1.0.1, 1.1.0
    • SQL

    Description

      Calling cacheTable() on some table t multiple times causes table t to be cached multiple times. This semantics is different from RDD.cache(), which is idempotent.

      We can check whether a table is already cached by checking:

      1. whether the structure of the underlying logical plan of the table is matches the pattern Subquery(_, SparkLogicalPlan(inMem @ InMemoryColumnarTableScan(_, _)))
      2. whether inMem.cachedColumnBuffers.getStorageLevel.useMemory is true

      Attachments

        Activity

          People

            lian cheng Cheng Lian
            lian cheng Cheng Lian
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: