Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-22978 Online slow response log
  3. HBASE-23938

Replicate slow/large RPC calls to HDFS

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0-alpha-1, 2.3.0, 1.7.0
    • 3.0.0-alpha-1, 2.3.0
    • None
    • None
    • Reviewed
    • Hide
      Config key: hbase.regionserver.slowlog.systable.enabled
      Default value: false

      This config can be enabled if hbase.regionserver.slowlog.buffer.enabled is already enabled. While hbase.regionserver.slowlog.buffer.enabled ensures that any slow/large RPC logs with complete details are written to ring buffer available at each RegionServer, hbase.regionserver.slowlog.systable.enabled would ensure that all such logs are also persisted in new system table hbase:slowlog.
      Operator can scan hbase:slowlog with filters to retrieve specific attribute matching records and this table would be useful to capture historical performance of slowness of RPC calls with detailed analysis.

      hbase:slowlog consists of single ColumnFamily info. info consists of multiple qualifiers similar to the attributes available to query as part of Admin API: get_slowlog_responses.

      One example of a row from hbase:slowlog scan result (Attached a sample screenshot in the Jira) :

       \x024\xC1\x06X\x81\xF6\xEC column=info:call_details, timestamp=2020-05-16T14:59:58.764Z, value=Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)
       \x024\xC1\x06X\x81\xF6\xEC column=info:client_address, timestamp=2020-05-16T14:59:58.764Z, value=172.20.10.2:57348
       \x024\xC1\x06X\x81\xF6\xEC column=info:method_name, timestamp=2020-05-16T14:59:58.764Z, value=Scan
       \x024\xC1\x06X\x81\xF6\xEC column=info:param, timestamp=2020-05-16T14:59:58.764Z, value=region { type: REGION_NAME value: "cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf." } scan { a
                                                                   ttribute { name: "_isolationlevel_" value: "\x5C000" } start_row: "cccccccc" time_range { from: 0 to: 9223372036854775807 } max_versions: 1 cache_blocks: true max_result_size: 2
                                                                   097152 caching: 2147483647 include_stop_row: false } number_of_rows: 2147483647 close_scanner: false client_handles_partials: true client_handles_heartbeats: true track_scan_met
                                                                   rics: false
       \x024\xC1\x06X\x81\xF6\xEC column=info:processing_time, timestamp=2020-05-16T14:59:58.764Z, value=24
       \x024\xC1\x06X\x81\xF6\xEC column=info:queue_time, timestamp=2020-05-16T14:59:58.764Z, value=0
       \x024\xC1\x06X\x81\xF6\xEC column=info:region_name, timestamp=2020-05-16T14:59:58.764Z, value=cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf.
       \x024\xC1\x06X\x81\xF6\xEC column=info:response_size, timestamp=2020-05-16T14:59:58.764Z, value=211227
       \x024\xC1\x06X\x81\xF6\xEC column=info:server_class, timestamp=2020-05-16T14:59:58.764Z, value=HRegionServer
       \x024\xC1\x06X\x81\xF6\xEC column=info:start_time, timestamp=2020-05-16T14:59:58.764Z, value=1589640743932
       \x024\xC1\x06X\x81\xF6\xEC column=info:type, timestamp=2020-05-16T14:59:58.764Z, value=ALL
       \x024\xC1\x06X\x81\xF6\xEC column=info:username, timestamp=2020-05-16T14:59:58.764Z, value=vjasani
      Show
      Config key: hbase.regionserver.slowlog.systable.enabled Default value: false This config can be enabled if hbase.regionserver.slowlog.buffer.enabled is already enabled. While hbase.regionserver.slowlog.buffer.enabled ensures that any slow/large RPC logs with complete details are written to ring buffer available at each RegionServer, hbase.regionserver.slowlog.systable.enabled would ensure that all such logs are also persisted in new system table hbase:slowlog. Operator can scan hbase:slowlog with filters to retrieve specific attribute matching records and this table would be useful to capture historical performance of slowness of RPC calls with detailed analysis. hbase:slowlog consists of single ColumnFamily info. info consists of multiple qualifiers similar to the attributes available to query as part of Admin API: get_slowlog_responses. One example of a row from hbase:slowlog scan result (Attached a sample screenshot in the Jira) :  \x024\xC1\x06X\x81\xF6\xEC column=info:call_details, timestamp=2020-05-16T14:59:58.764Z, value=Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)  \x024\xC1\x06X\x81\xF6\xEC column=info:client_address, timestamp=2020-05-16T14:59:58.764Z, value=172.20.10.2:57348  \x024\xC1\x06X\x81\xF6\xEC column=info:method_name, timestamp=2020-05-16T14:59:58.764Z, value=Scan  \x024\xC1\x06X\x81\xF6\xEC column=info:param, timestamp=2020-05-16T14:59:58.764Z, value=region { type: REGION_NAME value: "cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf." } scan { a                                                              ttribute { name: "_isolationlevel_" value: "\x5C000" } start_row: "cccccccc" time_range { from: 0 to: 9223372036854775807 } max_versions: 1 cache_blocks: true max_result_size: 2                                                              097152 caching: 2147483647 include_stop_row: false } number_of_rows: 2147483647 close_scanner: false client_handles_partials: true client_handles_heartbeats: true track_scan_met                                                              rics: false  \x024\xC1\x06X\x81\xF6\xEC column=info:processing_time, timestamp=2020-05-16T14:59:58.764Z, value=24  \x024\xC1\x06X\x81\xF6\xEC column=info:queue_time, timestamp=2020-05-16T14:59:58.764Z, value=0  \x024\xC1\x06X\x81\xF6\xEC column=info:region_name, timestamp=2020-05-16T14:59:58.764Z, value=cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf.  \x024\xC1\x06X\x81\xF6\xEC column=info:response_size, timestamp=2020-05-16T14:59:58.764Z, value=211227  \x024\xC1\x06X\x81\xF6\xEC column=info:server_class, timestamp=2020-05-16T14:59:58.764Z, value=HRegionServer  \x024\xC1\x06X\x81\xF6\xEC column=info:start_time, timestamp=2020-05-16T14:59:58.764Z, value=1589640743932  \x024\xC1\x06X\x81\xF6\xEC column=info:type, timestamp=2020-05-16T14:59:58.764Z, value=ALL  \x024\xC1\x06X\x81\xF6\xEC column=info:username, timestamp=2020-05-16T14:59:58.764Z, value=vjasani

    Description

      We should provide capability to replicate complete slow and large RPC logs to HDFS or create new system table in addition to Ring Buffer. This way we don't lose any of slow logs and operator can retrieve all the slow/large logs. Replicating logs to HDFS / creating new system table should be configurable.

      Attachments

        Issue Links

          Activity

            People

              vjasani Viraj Jasani
              vjasani Viraj Jasani
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: