Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28105

NPE in QuotaCache if Table is dropped from cluster

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.17, 2.5.5
    • 2.6.0, 2.4.18, 2.5.6, 3.0.0-beta-1
    • Quotas
    • None

    Description

      When running HBase-2.4.17, I met a NPE in regionserver log.

      Reproduce

      Config HBase cluster: 1 HMaster, 2 RS, 2.10.2 Hadoop.

      Execute the following commands in the HMaster node using hbase shell, 

      create 'uuidd9efa97f93a442b686adae6d9f7bb2e9', {NAME => 'uuid099cbece77834a83a52bb0611c3ea080', VERSIONS => 3, COMPRESSION => 'NONE', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 'uuidbc1bea73952749329d7f025aab382c4e', VERSIONS => 2, COMPRESSION => 'GZ', BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 'uuidff292310d9dc450697af2bb25d9f3e98', VERSIONS => 2, COMPRESSION => 'GZ', BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 'uuid449de028da6b4d35be0f187ebec6c3be', VERSIONS => 2, COMPRESSION => 'GZ', BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 'uuidc0840c98f9d348a18f2d454c7a503b65', VERSIONS => 2, COMPRESSION => 'GZ', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}
      create_namespace 'uuidec797633f5dd4ab9b96276135aeda9e2'
      create 'uuiddeb610fded9744889840ecd03dd18739', {NAME => 'uuid30a0f625ad454605908b60c932957ff0', VERSIONS => 1, COMPRESSION => 'GZ', BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}
      incr 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 'uuid46ddc3d3557e413e915e2393ae72c082', 'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 1
      flush 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 'uuid449de028da6b4d35be0f187ebec6c3be'
      drop 'uuiddeb610fded9744889840ecd03dd18739'
      put 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 'uuidf4704cae4d1e4661bd7664d26eb6b31b', 'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 'XlPpFGvSYfcEXWXgwARytlSeiaSuHJFqpirMmLduqGnpdXLlHJWBumraXiifQSvHqNHmTcyzLQIvuQrkujPghfdtRkhOkgKEJHsAuAiMMeWZjdTHNZqhkOdJBOzsRYUXKOCNKeSxEDWgnKgsFDHMtxdnKKudBuceOgYmCrdaPXMclKkZKCIEiFDcdoAEJGKXYVfOjb'
      disable 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
      drop 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
      create 'uuid9d05a5cb34e64910ac90675186e7d0d4', {NAME => 'uuid1ce512a5997b4efea3bdead2e7f723c3', VERSIONS => 2, COMPRESSION => 'NONE', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 'uuid0b1baaa4275e46b2a3a1d11d6540fc30', VERSIONS => 2, COMPRESSION => 'NONE', BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}
      put 'uuid9d05a5cb34e64910ac90675186e7d0d4', 'uuid552e42ade4c14099a1d8643bea1616d4', 'uuid1ce512a5997b4efea3bdead2e7f723c3:l', 1
      drop 'uuid9d05a5cb34e64910ac90675186e7d0d4'

      The exception will be thrown in either RS1 or RS2

      2023-09-19 20:29:28,268 INFO  [RS_OPEN_REGION-regionserver/hregion2:16020-2] handler.AssignRegionHandler: Opened uuid9d05a5cb34e64910ac90675186e7d0d4,,1695155367072.f59a0693a9469f9e1f131bf2aac1486d.
      2023-09-19 20:29:29,205 ERROR [regionserver/hregion2:16020.Chore.1] hbase.ScheduledChore: Caught error
      java.lang.NullPointerException
              at org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.updateQuotaFactors(QuotaCache.java:378)
              at org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.chore(QuotaCache.java:224)
              at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:158)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
              at org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:750)

      Root Cause

      The NPE is thrown at function: updateQuotaFactors()

      private void updateQuotaFactors() {
        // Update machine quota factor
        ClusterMetrics clusterMetrics;
        try {
          clusterMetrics = rsServices.getConnection().getAdmin()
            .getClusterMetrics(EnumSet.of(Option.SERVERS_NAME, Option.TABLE_TO_REGIONS_COUNT));
        } catch (IOException e) {
          LOG.warn("Failed to get cluster metrics needed for updating quotas", e);
          return;
        }  int rsSize = clusterMetrics.getServersName().size();
        if (rsSize != 0) {
          // TODO if use rs group, the cluster limit should be shared by the rs group
          machineQuotaFactor = 1.0 / rsSize;
        }  Map<TableName, RegionStatesCount> tableRegionStatesCount =
          clusterMetrics.getTableRegionStatesCount();  // Update table machine quota factors
        for (TableName tableName : tableQuotaCache.keySet()) {
          double factor = 1;
          try {
            // BUGGY LINE
            long regionSize = tableRegionStatesCount.get(tableName).getOpenRegions();
            if (regionSize == 0) {
              factor = 0;
            } else {
              int localRegionSize = rsServices.getRegions(tableName).size();
              factor = 1.0 * localRegionSize / regionSize;
            }
          } catch (IOException e) {
            LOG.warn("Get table regions failed: {}", tableName, e);
          }
          tableMachineQuotaFactors.put(tableName, factor);
        }
      } 

      At line 378: the tableRegionStatesCount.get(tableName) return null and thus it runs into NPE.

      The tableName leading to NPE is 'uuidd9efa97f93a442b686adae6d9f7bb2e9', which is disabled and dropped by the user.

      long regionSize = tableRegionStatesCount.get(tableName).getOpenRegions(); 

      The root cause is that when updating the quota factors, it iterates tableQuotaCache.keySet(), which might contain a table that has been dropped (Outdated cache). 

      Fix

      Maybe we can add a check to make sure the table still exists in the system (Tmp solution). I have attached the full regionserver log and a simple fix.

      This bug should also happen in 2.5.5 since the related code in QuotaCache remains the same (But I only tested 2.4.17).

       

      Attachments

        1. hbase--regionserver-a0320910ca45.log
          92 kB
          Ke Han
        2. 0001-avoid-NPE.patch
          2 kB
          Ke Han

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kehan5800 Ke Han
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: