Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.1.0
Description
The driver appears to use a ton of memory in certain cases to store the task metrics updated block status'. For instance I had a user reading data form hive and caching it. The # of tasks to read was around 62,000, they were using 1000 executors and it ended up caching a couple TB's of data. The driver kept running out of memory.
I investigated and it looks like there was 5GB of a 10GB heap being used up by the TaskMetrics._updatedBlockStatuses because there are a lot of blocks.
The updatedBlockStatuses was already removed from the task end event under SPARK-20084. I don't see anything else that seems to be using this. Anybody know if I missed something?
If its not being used we should remove it, otherwise we need to figure out a better way of doing it so it doesn't use so much memory.
Attachments
Issue Links
- is related to
-
SPARK-20970 Deprecate TaskMetrics._updatedBlockStatuses
- Resolved
- links to