Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14741

Incorrect results on boolean col when vectorization is ON

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.0.0, 2.1.0, 2.3.2
    • None
    • None

    Description

      I have attached the ORC part file on which the issue is manifesting. It has just one boolean column (lot of nulls, 231=trues : verified using orc file dump utility)

      1) Create external table on the part file attached

      CREATE EXTERNAL TABLE bool_vect_issue (
      `bool_col` BOOLEAN)
      ROW FORMAT SERDE
      'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
      STORED AS INPUTFORMAT
      'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
      OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
      LOCATION
      '<loc to which the part file is copied>';

      2)
      set hive.vectorized.execution.enabled = true;
      select sum(if((bool_col) , 1, 0)) from bool_vect_issue;
      gives
      708206

      3)
      set hive.vectorized.execution.enabled = false;
      select sum(if((bool_col) , 1, 0)) from bool_vect_issue;
      gives
      231

      The issue seem to have the same impact as HIVE-12435

      Attachments

        1. 000000_0
          2 kB
          Amruth Sampath

        Activity

          People

            Unassigned Unassigned
            amrk7 Amruth Sampath
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: