Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
2.3.0
-
None
-
None
-
Zeppelin in EMR
Description
I'm persisting a dataframe in Zeppelin which has dynamic allocation enabled to get a sense of how much memory the dataframe takes up. After I note the size, I unpersist the dataframe. For some reason, Yarn is not releasing the executors that were added to Zeppelin. If I don't run the persist and unpersist steps, the executors that were added are removed about a minute after the paragraphs complete. Looking at the storage tab in the Spark UI for the Zeppelin job, I don't see anything cached. I do not want to set spark.dynamicAllocation.cachedExecutorIdleTimeout to a lower value because I do not want executors with cached data to be released, but I do want ones that had cached data and no longer have cached data to be released.
Steps to reproduce:
- Enable dynamic allocation
- Set spark.dynamicAllocation.executorIdleTimeout to 60s
- Set spark.dynamicAllocation.cachedExecutorIdleTimeout to infinity
- Load a dataset, persist it, run a count on the persisted dataset, unpersist the persisted dataset
- Wait a couple minutes
Expected behaviour:
All executors will be released as the executors are no longer caching any data
Observed behaviour:
No executors were released
Attachments
Issue Links
- duplicates
-
SPARK-20286 dynamicAllocation.executorIdleTimeout is ignored after unpersist
- Resolved
- links to