Description
There is a memory leak in register temp table with cache on
This is the simple code to reproduce this issue:
val sparkConf = new SparkConf().setAppName("LeakTest") val sparkContext = new SparkContext(sparkConf) val sqlContext = new SQLContext(sparkContext) val tableName = "tmp" val jsonrdd = sparkContext.textFile("""sample.json""") var loopCount = 1L while(true) { sqlContext.jsonRDD(jsonrdd).registerTempTable(tableName) sqlContext.cacheTable(tableName) println("L: " +loopCount + " R:" + sqlContext.sql("""select count(*) from tmp""").count()) sqlContext.uncacheTable(tableName) loopCount += 1 }
The cause is that the InMemoryRelation. InMemoryColumnarTableScan uses the accumulator (InMemoryRelation.batchStats,InMemoryColumnarTableScan.readPartitions, InMemoryColumnarTableScan.readBatches ) to get some information from partitions or for test. These accumulators will register itself into a static map in Accumulators.originals and never get cleaned up.
Attachments
Issue Links
- links to