Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-17042

Add rpcCallSuccesses and OverallRpcProcessingTime to RpcMetrics for Namenode

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      We'd like to add two new types of metrics to the existing NN RpcMetrics/RpcDetailedMetrics. These two metrics can then be used as part of SLA/SLO for the HDFS service.

      • RpcCallSuccesses: it measures the number of RPC requests where they are successfully processed by a NN (e.g., with a response with an RpcStatus RpcStatusProto.SUCCESS). Then, together with RpcQueueNumOps (which refers the total number of RPC requests), we can derive the RpcErrorRate for our NN, as (RpcQueueNumOps - RpcCallSuccesses) / RpcQueueNumOps. 
      • OverallRpcProcessingTime for each RPC method: this metric measures the overall RPC processing time for each RPC method at the NN. It covers the time from when a request arrives at the NN to when a response is sent back. We are already emitting processingTime for each RPC method today in RpcDetailedMetrics. We want to extend it to emit overallRpcProcessingTime for each RPC method, which includes enqueueTime, queueTime, processingTime, responseTime, and handlerTime.

       

      Attachments

        Activity

          People

            xinglin Xing Lin
            xinglin Xing Lin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: