[IMPALA-9458] Improve runtime profile counters for slow IO from remote stores - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Backend
Labels:
- observability

Epic Color:
ghx-label-14

Description

Remote storage systems (e.g. cloud stores like S3 and ABFS) often have long tail latencies. Most I/O finishes relatively quickly, but some calls make take significantly longer. Even for HDFS, this is an issue (e.g. hedged reads were developed to help mitigate tail latencies, although no such feature exists for cloud storage connectors).

Currently, scan nodes just track the total amount of time spent reading data. It would be good to have a summary stats counter that tracks the min, avg, and max time spent reading data. This should at least allow us to identify when calls to remote storage services are taking longer than usual.

Attachments

Issue Links

relates to

IMPALA-8884 Track Read(), Open() and Write() operation time per disk queue

Resolved

IMPALA-9033 Log metrics for slow I/Os

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Sahil Takiar

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 04/Mar/20 02:26

Updated:: 15/Jan/21 17:30