Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8544

Expose additional S3A / S3Guard metrics

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Backend
    • Labels:
    • Epic Color:
      ghx-label-1

      Description

      S3A / S3Guard internally collects several useful metrics that we should consider exposing to Impala users. The full list of statistics can be found in o.a.h.fs.s3a.Statistic. The stats include: the number of S3 operations performed (put, get, etc.), invocation counts for various FileSystem methods, stream statistics (bytes read, written, etc.), etc.

      Some interesting stats that stand out:

      • "stream_aborted": "Count of times the TCP stream was aborted" - the number of TCP connection aborts, a high value would indicate performance issues
      • "stream_read_exceptions" : "Number of exceptions invoked on input streams" - incremented whenever an IOException is caught while reading (these exception don't always get propagated to Impala because they trigger a retry)
      • "store_io_throttled": "Requests throttled and retried" - looks like it tracks the number of times the fs retries an operation because the original request hit a throttling exception
      • "s3guard_metadatastore_retry": "S3Guard metadata store retry events" - looks like it tracks the number of times the fs retries S3Guard operations
      • "s3guard_metadatastore_throttled" : "S3Guard metadata store throttled events" - similar to "store_io_throttled" but looks like it is specific to S3Guard

      We should consider how to expose these metrics via Impala logs / runtime profiles.

      There are a few options:

      • S3AFileSystem exposes StorageStatistics specific to S3A / S3Guard via the FileSystem#getStorageStatistics method; the S3AStorageStatistics seems to include all the S3A / S3Guard metrics, however, I think the stats might be aggregated globally, which would make it hard to create per-query specific metrics
      • S3AInstrumentation exposes all the metrics as well, and looks like it is per-fs instance, so it is not aggregated globally; S3AInstrumentation extends o.a.h.metrics2.MetricsSource so perhaps it is exposed via some API (haven't looked into this yet)
      • S3AInputStream#toString dumps the statistics from o.a.h.fs.s3a.S3AInstrumentation.InputStreamStatistics and S3AFileSystem#toString dumps them all as well
      • S3AFileSystem updates the stats in o.a.h.fs.Statistics.StatisticsData as well (e.g. bytesRead, bytesWritten, etc.)

      Impala has a hdfs-fs-cache as well, so hdfsFs objects get shared across threads.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                stakiar Sahil Takiar
                Reporter:
                stakiar Sahil Takiar
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: