Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
ghx-label-1
Description
I saw test failures due to the desired catalog update timeout to be applied on the impalad:
-- 2024-02-22 07:40:09,898 INFO MainThread: Started query 2b40be0cac7fd641:609759ce00000000 -- 2024-02-22 07:41:10,139 INFO MainThread: Waiting until events processor syncs to event id:40299 -- 2024-02-22 07:41:10,618 DEBUG MainThread: Metric last-synced-event-id has been increased to 40294 but has not yet reached the desired value: 40299 -- 2024-02-22 07:41:10,724 DEBUG MainThread: Metric last-synced-event-id has reached the desired value: 40299 -- 2024-02-22 07:41:11,008 DEBUG MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) -- 2024-02-22 07:41:11,046 INFO MainThread: Getting metric: catalog.curr-version from ip-172-31-56-39:25000 -- 2024-02-22 07:41:11,053 INFO MainThread: Sleeping 1s before next retry. -- 2024-02-22 07:41:12,054 INFO MainThread: Getting metric: catalog.curr-version from ip-172-31-56-39:25000 -- 2024-02-22 07:41:12,061 INFO MainThread: Sleeping 1s before next retry. -- 2024-02-22 07:41:13,062 INFO MainThread: Getting metric: catalog.curr-version from ip-172-31-56-39:25000 -- 2024-02-22 07:41:13,067 INFO MainThread: Sleeping 1s before next retry. -- 2024-02-22 07:41:14,067 INFO MainThread: Getting metric: catalog.curr-version from ip-172-31-56-39:25000 -- 2024-02-22 07:41:14,075 INFO MainThread: Sleeping 1s before next retry. -- 2024-02-22 07:41:15,077 INFO MainThread: Getting metric: catalog.curr-version from ip-172-31-56-39:25000 -- 2024-02-22 07:41:15,082 INFO MainThread: Sleeping 1s before next retry. -- 2024-02-22 07:41:16,082 INFO MainThread: Getting metric: catalog.curr-version from ip-172-31-56-39:25000 -- 2024-02-22 07:41:16,086 INFO MainThread: Sleeping 1s before next retry. -- 2024-02-22 07:41:17,087 INFO MainThread: Getting metric: catalog.curr-version from ip-172-31-56-39:25000 -- 2024-02-22 07:41:17,096 INFO MainThread: Sleeping 1s before next retry. -- 2024-02-22 07:41:18,097 INFO MainThread: Getting metric: catalog.curr-version from ip-172-31-56-39:25000 -- 2024-02-22 07:41:18,113 INFO MainThread: Sleeping 1s before next retry. -- 2024-02-22 07:41:19,114 INFO MainThread: Getting metric: catalog.curr-version from ip-172-31-56-39:25000 -- 2024-02-22 07:41:19,120 INFO MainThread: Sleeping 1s before next retry. -- 2024-02-22 07:41:20,122 INFO MainThread: Getting metric: catalog.curr-version from ip-172-31-56-39:25000 -- 2024-02-22 07:41:20,129 INFO MainThread: Sleeping 1s before next retry. -- 2024-02-22 07:41:21,130 INFO MainThread: Metric catalog.curr-version did not reach value 8467 in 10s. Failing...
Unfortunately, the current value of "catalog.curr-version" is not logged so we don't know whether impalad is hanging or moving forward slowly.
The logs come from wait_for_metric_value(). When allow_greater=True, the current values are not logged:
# if allow_greater is True we wait until the metric value becomes >= the expected # value. if allow_greater: if value >= expected_value: LOG.info("Metric '%s' has reached desired value: %s" % (metric_name, value)) return value elif value == expected_value: LOG.info("Metric '%s' has reached desired value: %s" % (metric_name, value)) return value else: LOG.info("Waiting for metric value '%s'%s%s. Current value: %s" % (metric_name, '>=' if allow_greater else '=', expected_value, value)) LOG.info("Sleeping %ds before next retry." % interval) sleep(interval)