Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
This task aims to print a new[1] line of HiveMetaStore audit log in JSON format, similar as https://github.com/apache/hive/pull/1582 but extend to `cmd` details as well.
- existing audit log
```
HiveMetaStore.audit: ugi=xxx ip=xx.xx.xx.xx cmd=source:xx.xx.xx.xx get_table : db=xxx tbl=xxx
HiveMetaStore.audit: ugi=xxx ip=xx.xx.xx.xx cmd=source:xx.xx.xx.xx get_partition_with_auth : db=xx tbl=xx[xxx]
```
- The new audit log
```
HiveMetaStore.audit: {ugi: "xxx", ip: "xx.xx.xx.xx", cmd={source: "xx.xx.xx.xx", api="get_table", params= {db: "xxx", tbl: "xxx"}}}
{db: "xxx", tbl: "xxx", key=["xxx"]}
HiveMetaStore.audit: {ugi: "xxx", ip: "xx.xx.xx.xx", cmd={source: "xx.xx.xx.xx", api="get_partition_with_auth", params=}}
```
----------------
For some context, we're tracking the usage of the shared Hive Metastore Service. HiveMetaStore auditLog is the raw data we reply on, to understand the traffic on different dimensions, source(IP), API, database, table, etc.
Currently the audit log is in raw string without a standard format, especially for
extraLogInfo, code point here, makes it harder to analyze.
[1] should we print another line instead of replacing the existing one, to avoid a breaking-change?