Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7121

TPCH 4 takes longer when Statistics is disabled.

    XMLWordPrintableJSON

Details

    Description

      Here is TPCH 4 with sf 100:

      select
        o.o_orderpriority,
        count(*) as order_count
      from
        orders o
      
      where
        o.o_orderdate >= date '1996-10-01'
        and o.o_orderdate < date '1996-10-01' + interval '3' month
        and 
        exists (
          select
            *
          from
            lineitem l
          where
            l.l_orderkey = o.o_orderkey
            and l.l_commitdate < l.l_receiptdate
        )
      group by
        o.o_orderpriority
      order by
        o.o_orderpriority;
      

      The plan has changed when Statistics is disabled. A Hash Agg and a Broadcast Exchange have been added. These two operators expand the number of rows from the lineitem table from 137M to 9B rows. This forces the hash join to use 6GB of memory instead of 30 MB.

      Attachments

        Issue Links

          Activity

            People

              gparai Gautam Parai
              rhou Robert Hou
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: