Description
An Impala instance (which recently adopted the Kudu Kerberos implementation) happened to run into a temporary DNS outage. The user set up Kerberos to have a very short Kerberos ticket lifetime (30 minutes). For the couple of hours in which the DNS was done, the renewal thread quickly racked up many renewal failure, leading to a very long backoff time (up to 5 hours eventually). Even after the DNS has recovered, the Impala process still fails to communicate with other nodes due to the expired TGT. The renewal thread didn't wake up in some cases for more than 3 hours after the DNS recovered. This seems to provide a rather bad user experience so it may be worth considering having a configurable upper bound on exponential backoff when ticket renewal fails. At a minimum, may help to log the backoff time to help diagnose the issue.
W0822 23:15:35.960669 10964 init.cc:188] Kerberos reacquire error: : Runtime error: Reacquire error: unable to login from keytab: Cannot contact any KDC for realm '---redacted---' W0822 23:19:21.016465 10964 init.cc:188] Kerberos reacquire error: : Runtime error: Reacquire error: unable to login from keytab: Cannot contact any KDC for realm '---redacted---' W0822 23:25:48.059895 10964 init.cc:188] Kerberos reacquire error: : Runtime error: Reacquire error: unable to login from keytab: Cannot contact any KDC for realm '---redacted---' W0822 23:38:14.100435 10964 init.cc:188] Kerberos reacquire error: : Runtime error: Reacquire error: unable to login from keytab: Cannot contact any KDC for realm '---redacted---' W0822 23:59:26.152209 10964 init.cc:188] Kerberos reacquire error: : Runtime error: Reacquire error: unable to login from keytab: Cannot contact any KDC for realm '---redacted---' W0823 00:42:28.194363 10964 init.cc:188] Kerberos reacquire error: : Runtime error: Reacquire error: unable to login from keytab: Cannot contact any KDC for realm ''---redacted---' W0823 01:58:41.240950 10964 init.cc:188] Kerberos reacquire error: : Runtime error: Reacquire error: unable to login from keytab: Cannot contact any KDC for realm ''---redacted---' W0823 03:28:54.285295 10964 init.cc:188] Kerberos reacquire error: : Runtime error: Reacquire error: unable to login from keytab: Cannot contact any KDC for realm ''---redacted---' W0823 08:42:57.335754 10964 init.cc:188] Kerberos reacquire error: : Runtime error: Reacquire error: unable to login from keytab: Cannot contact any KDC for realm '---redacted---' I0823 13:58:11.337008 10964 init.cc:283] Successfully reacquired a new kerberos TGT I0823 14:08:46.362918 10964 init.cc:283] Successfully reacquired a new kerberos TGT