Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29550

Enhance locking in session catalog

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • 2.4.4
    • None
    • SQL

    Description

      In my streaming application``spark.streaming.concurrentJobs`` is set to 50 which is used as size for underlying thread pool. I automatically create/alter tables/view in runtime. I order to do that i invoke ``create ... if not exists operations`` on driver on each batch invocation. Once i noticed that  most of batch time is spent on driver but not on executors. I did a thread dump and figured out that most of the threads are blocked on SessionCatalog waiting for a lock.  

      Existing implementation of SessionCatalog uses a single lock which is used almost by all the methods to guard ``currentDb`` and ``tempViews`` variables. I propose to enhance locking behaviour of SessionCatalog by :

      1. Employing ReadWriteLock which allows to execute read operations concurrently. 
      2. Replace synchronized with the corresponding read or write lock.

      Also it's possible to go even further and strip locks for ``currentDb`` and ``tempViews`` but i'm not sure whether it's possible from the implementation point of view. Probably someone will help me with this.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              choojoyq Nikita Gorbachevski
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: