Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20153

Count and Sum UDF consume more memory in Hive 2+

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.3.2
    • 4.0.0-alpha-1
    • UDF
    • None

    Description

      While playing with Hive2, we noticed that queries with a lot of count() and sum() aggregations run out of memory on Hadoop side where they worked before in Hive1. 

      In many queries, we have to double the Mapper Memory settings (in our particular case mapreduce.map.java.opts from -Xmx2000M to -Xmx4000M), it makes it not so easy to upgrade to Hive 2.

      Taking heap dump, we see one of the main culprit is the field 'uniqueObjects' in GeneraicUDAFSum and GenericUDAFCount, which was added to support Window functions.

      Attachments

        1. Screen Shot 2018-07-12 at 6.41.28 PM.png
          49 kB
          Szehon Ho
        2. HIVE-20153.1.patch
          3 kB
          Aihua Xu

        Activity

          People

            aihuaxu Aihua Xu
            szehon Szehon Ho
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: