Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-28009

Shared work optimizer ignores schema merge setting in case of virtual column difference

    XMLWordPrintableJSON

Details

    Description

      set hive.optimize.shared.work.merge.ts.schema=false;
      
      create table t1(a int);
      
      explain
      WITH t AS (
        select BLOCK__OFFSET__INSIDE__FILE, INPUT__FILE__NAME, a from (
          select BLOCK__OFFSET__INSIDE__FILE, INPUT__FILE__NAME, a, row_number() OVER (partition by INPUT__FILE__NAME) rn from t1
          where a = 1
        ) q
        where rn=1
      )
      select BLOCK__OFFSET__INSIDE__FILE, INPUT__FILE__NAME, a from t1 where NOT (a = 1) AND INPUT__FILE__NAME IN (select INPUT__FILE__NAME from t)
      union all
      select * from t
      

      Before SharedWorkOptimizer:

      TS[0]-FIL[32]-SEL[2]-RS[14]-MERGEJOIN[42]-SEL[17]-UNION[27]-FS[29]
      TS[3]-FIL[34]-RS[5]-SEL[6]-PTF[7]-FIL[33]-SEL[8]-GBY[13]-RS[15]-MERGEJOIN[42]
      TS[18]-FIL[36]-RS[20]-SEL[21]-PTF[22]-FIL[35]-SEL[23]-UNION[27]
      

      After SharedWorkOptimizer:

      TS[0]-FIL[32]-SEL[2]-RS[14]-MERGEJOIN[42]-SEL[17]-UNION[27]-FS[29]
           -FIL[34]-RS[5]-SEL[6]-PTF[7]-FIL[33]-SEL[8]-GBY[13]-RS[15]-MERGEJOIN[42]
      TS[18]-FIL[36]-RS[20]-SEL[21]-PTF[22]-FIL[35]-SEL[23]-UNION[27]
      

      TS[3] and TS[18] are merged but their schema doesn't match and hive.optimize.shared.work.merge.ts.schema was turned off in the test

      TS[3]: 0 = FILENAME
      TS[18]: 0 = BLOCKOFFSET,  FILENAME
      

      Attachments

        Issue Links

          Activity

            People

              kkasa Krisztian Kasa
              kkasa Krisztian Kasa
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: