Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21752

Thread Safety and Memory Leaks in HCatRecordObjectInspectorFactory

    XMLWordPrintableJSON

Details

    • Patch

    Description

      Summary

      There are a couple of issues in HCatRecordObjectInspectorFactory[1] because it uses a static Java HashMap to cache objects:

      1. Java HashMap is not thread safe. This can lead to data corruptions and race conditions in multithreaded servers when two threads update the ObjectInspector.
      2. There is no eviction policy and as a result, this can result in memory leaks. If user reads a lot of different schemas, Hive server will start seeing memory pressure, once it start going to have a lot of cached record and object inspectors.

      This patch propose to replace the cache using a Guava cache which enables cache evictions and thread safety. Guava cache is already used in Hive ObjectInspectorFactory [2], so this change is consistent with the rest of Hive.

      Attached is a patch that fixes this issue.

      References:

      1. https://github.com/apache/hive/blob/b58d50cb73a1f79a5d079e0a2c5ac33d2efc33a0/hcatalog/core/src/main/java/org/apache/hive/hcatalog/data/HCatRecordObjectInspectorFactory.java#L44-L47
      2. https://github.com/apache/hive/blob/b58d50cb73a1f79a5d079e0a2c5ac33d2efc33a0/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java#L68-L87

       

      Review Board Link:

      Attachments

        1. HIVE-21752.patch
          3 kB
          Jalpan Randeri
        2. HIVE-21752.patch
          3 kB
          Jalpan Randeri

        Issue Links

          Activity

            People

              jalpan.randeri Jalpan Randeri
              jalpan.randeri Jalpan Randeri
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 24h
                  24h
                  Remaining:
                  Remaining Estimate - 24h
                  24h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified