Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
-
ghx-label-7
Description
There are several places that catalogd will fetch all events of a specifit type on a table. E.g. in TableLoader#load(), if the table has an old createEventId, catalogd will fetch all CREATE_TABLE events after that createEventId on the table.
Fetching the list of events is expensive since the filtering is done on client side, i.e. catalogd fetch all events and filter them locally based on the event type and table name:
https://github.com/apache/impala/blob/148888e3ed4f97292499b2e6ee8d5a756dc648d9/fe/src/main/java/org/apache/impala/catalog/TableLoader.java#L98-L102
https://github.com/apache/impala/blob/b7ddbcad0dd6accb559a3f391a897a8c442d1728/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L336
This could take hours if there are lots of events (e.g 1M) in HMS. In fact, NotificationEventRequest can specify an eventTypeSkipList. Catalogd can do the filtering of event type in HMS side. On higher Hive versions that have HIVE-27499, catalogd can also specify the table name in the request (IMPALA-12607).
This Jira focus on specifying the eventTypeSkipList when fetching events of a particular type on a table.
Attachments
Issue Links
- is related to
-
HIVE-28146 Add positive event type filter to the HMS notification fetch API
- Open
- relates to
-
IMPALA-12399 Pass eventTypeSkipList with OPEN_TXN in NotificationEventRequest to avoid receiving OPEN_TXN events from HMS
- Resolved