Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-19283

Select count(distinct()) a couple of times stuck in last reducer

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.1.1
    • None
    • CBO, Logical Optimizer
    • None

    Description

       Distinct count query performance is significantly improved due to HIVE-10568

      select count(distinct elevenst_id)
      from 11st.log_table
      where part_dt between '20180101' and '20180131'

       

      However, some queries with several distinct counts are still slow. It starts with multiple mappers, but stuck in the last one reducer. 

      select 
        count(distinct elevenst_id)
      , count(distinct member_id)
      , count(distinct user_id)
      , count(distinct action_id)
      , count(distinct other_id)
       from 11st.log_table
      where part_dt between '20180101' and '20180131'

       

      Attachments

        Issue Links

          Activity

            People

              ashutoshc Ashutosh Chauhan
              goun Goun Na
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: