Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47361 Improve JDBC data sources
  3. SPARK-46747

Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4
    • 4.0.0, 3.5.1, 3.4.3
    • SQL
    • Important

    Description

      Background:
      PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table existence in the database by overriding the default JdbcDialect.getTableExistsQuery which has WHERE 1 = 0.

      Issue:
      Due to LIMIT 1 query pattern, we are seeing high number of shared locks in the PostgreSQL installations where there are many partitions under a table that's being written to. Hence resorting to the default JdbcDialect which does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the partitions and effectively checks for table existence.

      The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain scenarios, especially with partitioned tables or tables with a lot of data, as it may take shared locks on all partitions or involve more planner and execution time to determine the quickest way to get a single row.

      On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read any data due to the always false WHERE condition. This makes it a lighter operation, as it typically only involves checking the table's metadata to validate the table's existence without taking locks on the table's data or partitions.

      So, considering performance and minimizing locks, SELECT 1 FROM table WHERE 1=0 would be a better choice if we're strictly looking to check for a table's existence and want to avoid potentially heavier operations like taking shared locks on partitions.

      Attachments

        Issue Links

          Activity

            People

              yao Kent Yao
              bbellam Bala Bellam
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: