Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7224

UpdateCatalogMetrics very slow when there are many tables

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 2.13.0, Impala 3.1.0
    • Catalog
    • None
    • ghx-label-7

    Description

      impalad calls UpdateCatalogMetrics after each statement which is considered a DDL. This includes statements like USE, SHOW TABLES, DESCRIBE, etc, which don't actually change the number of tables in the catalog, and therefore probably don't need to update metrics. That aside, even when the metrics do need to be updated, the implementation is very slow. It calls getTableNames on each database, which results in (a) creating an array of all the names, (b) sorting that array and (c) encoding/decoding that whole array into Thrift. This is very expensive: on a use case with approximately 8M tables, each such call takes 10-12 seconds of CPU, most of which is spent in sorting and encoding. All that's really needed is a count of tables, which could be fetched directly.

      Attachments

        Activity

          People

            tlipcon Todd Lipcon
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: