HIVE-20202 describes how Hive added a web endpoint for online in production profiling based on async-profiler. The endpoint was added as a servlet to httpserver and supports retrieval of flamegraphs compiled from the profiler trace. Async profiler (https://github.com/jvm-profiling-tools/async-profiler ) can also profile heap allocations, lock contention, and HW performance counters in addition to CPU.
The profiling overhead is pretty low and is safe to run in production. The async-profiler project measured and describes CPU and memory overheads on these issues: https://github.com/jvm-profiling-tools/async-profiler/issues/14 and https://github.com/jvm-profiling-tools/async-profiler/issues/131
We have an httpserver based servlet stack so we can use
HIVE-20202 as an implementation template for a similar feature for HBase daemons. Ideally we achieve these requirements:
- Retrieve flamegraph SVG generated from latest profile trace.
- Online enable and disable of profiling activity. (async-profiler does not do instrumentation based profiling so this should not cause the code gen related perf problems of that other approach and can be safely toggled on and off while under production load.)
- CPU profiling.
- ALLOCATION profiling.