Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12788

HBaseTable still get loaded even if HBase is down

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 4.0.0, Impala 3.4.0, Impala 3.4.1, Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, Impala 4.1.2, Impala 4.3.0
    • Impala 4.4.0
    • Catalog
    • None
    • ghx-label-8

    Description

      This is identified by an internal S3 build that doesn't launch HBase. There are some tests that still run queries on HBase tables, e.g. TestDdlStatements::test_alter_set_column_stats. But they don't fail on even if the table can't be correctly loaded. Catalogd logs show that the connection failure to HBase is ignored:

      I0203 14:12:33.687620 20673 TableLoadingMgr.java:71] Loading metadata for table: functional_hbase.alltypes
      I0203 14:12:33.687674 24282 TableLoader.java:76] Loading metadata for: functional_hbase.alltypes (background load)
      I0203 14:12:33.687706 20673 TableLoadingMgr.java:73] Remaining items in queue: 0. Loads in progress: 1
      I0203 14:12:33.690941 26564 JniCatalog.java:257] execDdl request: DROP_DATABASE test_compute_stats_9c95c5d8 issued by jenkins
      I0203 14:12:33.691668 24282 Table.java:218] createEventId_ for table: functional_hbase.alltypes set to: -1
      ......
      W0203 14:13:06.941573  1978 ReadOnlyZKClient.java:193] 0x65bc7c50 to localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, retries = 30, give up
      W0203 14:13:06.947460 24282 ConnectionImplementation.java:641] Retrieve cluster id failed
      Java exception follows:
      java.util.concurrent.ExecutionException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
              at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
              at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
              at org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:639)
              at org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:325)
              at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
              at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
              at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
              at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
              at org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:231)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:422)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
              at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:325)
              at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:230)
              at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:130)
              at org.apache.impala.catalog.FeHBaseTable$Util$ConnectionHolder.getConnection(FeHBaseTable.java:722)
              at org.apache.impala.catalog.FeHBaseTable$Util.getHBaseTable(FeHBaseTable.java:126)
              at org.apache.impala.catalog.HBaseTable.load(HBaseTable.java:112)
              at org.apache.impala.catalog.TableLoader.load(TableLoader.java:144)
              at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:245)
              at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:242)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
              at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
              at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
              at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:195)
              at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:340)
              ... 1 more
      I0203 14:13:07.058998 24282 TableLoader.java:175] Loaded metadata for: functional_hbase.alltypes (33371ms)
      I0203 14:13:07.866829 21368 catalog-server.cc:403] A catalog update with 9 entries is assembled. Catalog version: 6192 Last sent catalog version: 6181
      I0203 14:13:07.870369 21344 catalog-server.cc:816] Collected update: 1:TABLE:functional_hbase.alltypes, version=6193, original size=3855, compressed size=1471
      I0203 14:13:07.872047 21344 catalog-server.cc:816] Collected update: 1:CATALOG_SERVICE_ID, version=6193, original size=60, compressed size=58

      This is problematic since impalad thought the table is correctly loaded and will try to load it again when applying the catalog update, which could block the statestore subscriber thread for a long time, causing other DDL queries to be blocked as well since they can't acquire the catalog update lock.

      We've seen TestAsyncLoadData.test_async_load timeout on S3 (IMPALA-11285) and this is the cause.

      Here are logs showing impalad is blocked in applying catalog update of the HBase table:

      I0203 14:13:09.359010  3636 Frontend.java:1917] db4f57572baab787:ebdb853600000000] Analyzing query: load data inpath '/test-warehouse/test_load_staging_beeswax_True'           into table test_async_load_898a2f19.test_load_nopart_beeswax_True db: functional
      ...
      I0203 14:13:42.188225  4881 ClientCnxn.java:1246] Socket error occurred: localhost/0:0:0:0:0:0:0:1:2181: Connection refused
      W0203 14:13:42.288529  4880 ReadOnlyZKClient.java:189] 0x43325be0 to localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, retries = 29
      I0203 14:13:43.288617  4881 ClientCnxn.java:1111] Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
      I0203 14:13:43.288892  4881 ClientCnxn.java:1246] Socket error occurred: localhost/127.0.0.1:2181: Connection refused
      W0203 14:13:43.389173  4880 ReadOnlyZKClient.java:189] 0x43325be0 to localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, retries = 30
      I0203 14:13:44.389231  4881 ClientCnxn.java:1111] Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
      I0203 14:13:44.389554  4881 ClientCnxn.java:1246] Socket error occurred: localhost/127.0.0.1:2181: Connection refused
      W0203 14:13:44.489856  4880 ReadOnlyZKClient.java:193] 0x43325be0 to localhost:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, retries = 30, give up
      W0203 14:13:44.500921 22023 ConnectionImplementation.java:641] Retrieve cluster id failed
      Java exception follows:
      java.util.concurrent.ExecutionException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
              at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
              at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
              at org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:639)
              at org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:325)
              at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
              at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
              at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
              at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
              at org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:231)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:422)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
              at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:325)
              at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:230)
              at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:130)
              at org.apache.impala.catalog.FeHBaseTable$Util$ConnectionHolder.getConnection(FeHBaseTable.java:722)
              at org.apache.impala.catalog.FeHBaseTable$Util.getHBaseTable(FeHBaseTable.java:126)
              at org.apache.impala.catalog.HBaseTable.loadFromThrift(HBaseTable.java:139)
              at org.apache.impala.catalog.Table.fromThrift(Table.java:538)
              at org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:474)
              at org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:329)
              at org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:258)
              at org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114)
              at org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:513)
              at org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:185)
      Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
              at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
              at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
              at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:195)
              at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:340)
              at java.lang.Thread.run(Thread.java:748)
      I0203 14:13:44.585079 22023 impala-server.cc:2060] Catalog topic update applied with version: 6193 new min catalog object version: 2
      
      ... // After this the table test_load_nopart_beeswax_true from LoadData statement can be added
      I0203 14:13:44.586282  4723 ImpaladCatalog.java:228] db4f57572baab787:ebdb853600000000] Adding: TABLE:test_async_load_898a2f19.test_load_nopart_beeswax_true version: 6207 size: 3866 

      The bug is that table loading on HBase table should fail if catalogd fails to connect to HBase.

      Attachments

        Issue Links

          Activity

            People

              stigahuang Quanlong Huang
              stigahuang Quanlong Huang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: