[ZOOKEEPER-4358] Latency metrics showing surprising results for a keberos-enabled cluster - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Not A Problem
Affects Version/s: 3.6.2
Fix Version/s: None
Component/s: metric system
Labels:
None

Description

Hi,

I'm trying to understand why the values of min/avg/max latency are showing surprising results. The graph below shows the max latency value of a particular node for last 7 days. The value increases gradually over time and it only ever decreases when the node gets restarted as if the metric value gets reset.

https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerStats.java#L226

3 nodes
Keberos enabled
TGT ticket cashe enabled.

I believes the values of min/avg/max latency should show more realistic variations. It's very unlikely that the max latency value is expected to always increase while the node is running.

https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerStats.java#L142

public void updateLatency(Request request, long currentTime) {
long latency = currentTime - request.createTime;
if (latency < 0) {
return;
}
requestLatency.addDataPoint(latency);
if (request.getHdr() != null) {
// Only quorum request should have header
ServerMetrics.getMetrics().UPDATE_LATENCY.add(latency);
} else {
// All read request should goes here
ServerMetrics.getMetrics().READ_LATENCY.add(latency);
}

The method called let me think that the max latency metric gets set if the current values happens to be lower. __

https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/metric/AvgMinMaxCounter.java#L51

private void setMax(long value) {
long current;
while (value > (current = max.get()) && !max.compareAndSet(current, value)) {
// no op
}
}

I put below a graph of a particular from a totally different cluster for last 2 days. The node has not been restarted and all the data is from the same process. We can see a more realistic variations of the max latency metric as it would normally.

Thanks for you time in advance,

Math

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

image-2021-08-27-16-37-50-112.png
27/Aug/21 15:37
65 kB
Mathieu Gaudin
image-2021-08-27-16-10-28-783.png
27/Aug/21 15:10
36 kB
Mathieu Gaudin

Activity

People

Assignee:: Unassigned

Reporter:: Mathieu Gaudin

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 27/Aug/21 15:43

Updated:: 08/Feb/23 09:20

Resolved:: 08/Feb/23 09:20