Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12933

Catalogd should set eventTypeSkipList when fetching specifit events for a table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • Impala 4.4.0
    • Catalog
    • None

    Description

      There are several places that catalogd will fetch all events of a specifit type on a table. E.g. in TableLoader#load(), if the table has an old createEventId, catalogd will fetch all CREATE_TABLE events after that createEventId on the table.

      Fetching the list of events is expensive since the filtering is done on client side, i.e. catalogd fetch all events and filter them locally based on the event type and table name:
      https://github.com/apache/impala/blob/148888e3ed4f97292499b2e6ee8d5a756dc648d9/fe/src/main/java/org/apache/impala/catalog/TableLoader.java#L98-L102
      https://github.com/apache/impala/blob/b7ddbcad0dd6accb559a3f391a897a8c442d1728/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L336

      This could take hours if there are lots of events (e.g 1M) in HMS. In fact, NotificationEventRequest can specify an eventTypeSkipList. Catalogd can do the filtering of event type in HMS side. On higher Hive versions that have HIVE-27499, catalogd can also specify the table name in the request (IMPALA-12607).

      This Jira focus on specifying the eventTypeSkipList when fetching events of a particular type on a table.

      Attachments

        Issue Links

          Activity

            People

              stigahuang Quanlong Huang
              stigahuang Quanlong Huang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: