Details
-
Bug
-
Status: In Progress
-
Critical
-
Resolution: Unresolved
-
None
-
None
-
None
-
Hive-2.3.
Description
We found lots of orphan/old/leftover lock entries inside HIVE_LOCKS. There are more than 120k locks in HIVE_LOCKS of MySQL database. We also checked the top 3 tables which are related to the existing locks:
mysql> select HL_DB,HL_TABLE, count(*) from HIVE_LOCKS group by 1,2 order by 3 desc limit 10; +-----------+------------------------------+----------+ | HL_DB | HL_TABLE | count(*) | +-----------+------------------------------+----------+ | db1 | table1 | 66984 | | db1 | table2 | 33208 | | db1 | table3 | 9315 | …
For table “db1. table1”, here are 3 Hive sessions related, and each of the Hive session is waiting for 22328 read locks. This is because this table “db1. table1” is a huge partition table, and it has more than 200k child partitions. I am guessing each of Hive session was trying to do a full table scan on it. I group-by based on column HL_LAST_HEARTBEAT instead, here is the list:
mysql> select cast(FROM_UNIXTIME(HL_LAST_HEARTBEAT/1000) as date) as dt,count(*) as cnt from HIVE_LOCKS
-> group by 1 order by 1;
+------------+--------+
| dt | cnt |
+------------+--------+
| 1969-12-31 | 2 |
| 2019-05-20 | 10 |
| 2019-05-21 | 3 |
| 2019-05-23 | 5 |
| 2019-05-24 | 2 |
| 2019-05-25 | 1 |
| 2019-05-29 | 7 |
| 2019-05-30 | 2 |
| 2019-06-11 | 13 |
| 2019-06-28 | 3 |
| 2019-07-02 | 2 |
| 2019-07-04 | 5 |
| 2019-07-09 | 1 |
| 2019-07-15 | 2 |
| 2019-07-16 | 1 |
| 2019-07-18 | 2 |
| 2019-07-20 | 3 |
| 2019-07-29 | 5 |
| 2019-07-30 | 9 |
| 2019-07-31 | 7 |
| 2019-08-02 | 2 |
| 2019-08-06 | 5 |
| 2019-08-07 | 17 |
| 2019-08-08 | 8 |
| 2019-08-09 | 5 |
| 2019-08-21 | 1 |
| 2019-08-22 | 20 |
| 2019-08-23 | 1 |
| 2019-08-26 | 5 |
| 2019-08-27 | 98 |
| 2019-08-28 | 3 |
| 2019-08-29 | 1 |
| 2019-09-02 | 3 |
| 2019-09-04 | 3 |
| 2019-09-05 | 105 |
| 2019-09-06 | 3 |
| 2019-09-07 | 2 |
| 2019-09-09 | 6 |
| 2019-09-12 | 9 |
| 2019-09-13 | 1 |
| 2019-09-17 | 1 |
| 2019-09-24 | 3 |
| 2019-09-26 | 6 |
| 2019-09-27 | 4 |
| 2019-09-30 | 1 |
| 2019-10-01 | 2 |
| 2019-10-03 | 9 |
| 2019-10-04 | 2 |
| 2019-10-06 | 1 |
| 2019-10-08 | 1 |
| 2019-10-09 | 1 |
| 2019-10-10 | 6 |
| 2019-10-11 | 1 |
| 2019-10-16 | 13 |
| 2019-10-17 | 1 |
| 2019-10-18 | 2 |
| 2019-10-19 | 2 |
| 2019-10-21 | 10 |
| 2019-10-22 | 6 |
| 2019-10-28 | 2 |
| 2019-10-29 | 4 |
| 2019-10-30 | 2 |
| 2019-10-31 | 2 |
| 2019-11-05 | 2 |
| 2019-11-06 | 2 |
| 2019-11-11 | 1 |
| 2019-11-13 | 1 |
| 2019-11-14 | 1 |
| 2019-11-21 | 4 |
| 2019-11-26 | 1 |
| 2019-11-27 | 1 |
| 2019-12-05 | 4 |
| 2019-12-06 | 2 |
| 2019-12-12 | 1 |
| 2019-12-14 | 1 |
| 2019-12-15 | 3 |
| 2019-12-16 | 1 |
| 2019-12-17 | 1 |
| 2019-12-18 | 1 |
| 2019-12-19 | 2 |
| 2019-12-20 | 2 |
| 2019-12-23 | 1 |
| 2019-12-27 | 1 |
| 2020-01-07 | 1 |
| 2020-01-08 | 14 |
| 2020-01-09 | 2 |
| 2020-01-12 | 372 |
| 2020-01-14 | 2 |
| 2020-01-15 | 1 |
| 2020-01-20 | 11 |
| 2020-01-21 | 119253 |
| 2020-01-23 | 113 |
| 2020-01-24 | 4 |
| 2020-01-25 | 536 |
| 2020-01-26 | 2132 |
| 2020-01-27 | 396 |
| 2020-01-28 | 1 |
| 2020-01-29 | 3 |
| 2020-01-30 | 11 |
| 2020-01-31 | 11 |
| 2020-02-03 | 2 |
| 2020-02-04 | 4 |
| 2020-02-05 | 5 |
| 2020-02-06 | 8 |
| 2020-02-10 | 32 |
| 2020-02-11 | 15 |
| 2020-02-12 | 14 |
| 2020-02-13 | 1 |
| 2020-02-14 | 92 |
+------------+--------+
109 rows in set (0.16 sec)
However most of the entries have HL_ACQUIRED_AT=NULL in HIVE_LOCKS:
mysql> SELECT COUNT(*) FROM HIVE_LOCKS WHERE HL_ACQUIRED_AT is null;
+----------+
| COUNT(*) |
+----------+
| 123437 |
+----------+
1 row in set (0.04 sec)
mysql> SELECT COUNT(*) FROM HIVE_LOCKS WHERE HL_ACQUIRED_AT is not null;
+----------+
| COUNT(*) |
+----------+
| 97 |
+----------+
1 row in set (0.04 sec)
Hot-fix is to remove orphan/lost locks from HIVE_LOCKS, but this does not solve the problem in the future.