Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11345

Query failed when creating equal conjunction map for Parquet bloom filter

    XMLWordPrintableJSON

Details

    Description

      When querying Hive table was added columns without using 'cascade', Impala will encounter error like "Unable to find SchemaNode for path 'db.table.column' in the schema of file 'hdfs://xxx/path/to/parquet_file_before_add_column'." I checked parquet file in error log and found that the schema is not compatible with table metadata. Call stack is attached as below. Path and table name is masked: 

      I0609 18:04:25.970052 115413 status.cc:129] c94d0ab3fdf8f943:3203006100000002] Unable to find SchemaNode for path 'xxx_db.xxx_table.xxx_column' in the schema of file 'hdfs://xxx_nn/xxx_table_path/000000_0'.
          @           0xea543b  impala::Status::Status()
          @          0x1e3225c  impala::HdfsParquetScanner::CreateColIdx2EqConjunctMap()
          @          0x1e363ea  impala::HdfsParquetScanner::Open()
          @          0x19b40d0  impala::HdfsScanNodeBase::CreateAndOpenScannerHelper()
          @          0x1b5cbae  impala::HdfsScanNode::ProcessSplit()
          @          0x1b5e12a  impala::HdfsScanNode::ScannerThread()
          @          0x1b5e9c6  _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
          @          0x18eafa9  impala::Thread::SuperviseThread()
          @          0x18ee11a  boost::detail::thread_data<>::run()
          @          0x2385510  thread_proxy
          @     0x7fb5b0745162  start_thread
          @     0x7fb5ad21df6c  __clone

      The error may be relation with IMPALA-10640. Bloom filter requires right  hand values of equal conjunction matches with current file schema. The filter will be unavailable if the column does not exist in all parquet files scanned. I think we can disable parquet bloom filter for this single query or scan node when discovered such situation.

      How to reproduce (using impala-shell):

      1. create table parquet_test (id INT) stored as parquet;
      2. insert into parquet_test values (1),(2),(3);
      3. alter table parquet_test add columns (name STRING);
      4. insert into parquet_test values (4, "James");
      5. select * from parquet_test where name in ("Lily");
      6. Error occured.

      Attachments

        Activity

          People

            daniel.becker Daniel Becker
            Yuchen Fan Yuchen Fan
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: