Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.4.4
-
None
Description
In my streaming application``spark.streaming.concurrentJobs`` is set to 50 which is used as size for underlying thread pool. I automatically create/alter tables/view in runtime. I order to do that i invoke ``create ... if not exists operations`` on driver on each batch invocation. Once i noticed that most of batch time is spent on driver but not on executors. I did a thread dump and figured out that most of the threads are blocked on SessionCatalog waiting for a lock.
Existing implementation of SessionCatalog uses a single lock which is used almost by all the methods to guard ``currentDb`` and ``tempViews`` variables. I propose to enhance locking behaviour of SessionCatalog by :
- Employing ReadWriteLock which allows to execute read operations concurrently.
- Replace synchronized with the corresponding read or write lock.
Also it's possible to go even further and strip locks for ``currentDb`` and ``tempViews`` but i'm not sure whether it's possible from the implementation point of view. Probably someone will help me with this.
Attachments
Issue Links
- links to