Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9529

Predicates on nested types may not be applied correctly on table masking view

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • Impala 3.4.0
    • None

    Description

      When column masking is enabled, predicates that reference both primitive and nested types will be resolved to "multi-tuple predicates" (not bound to any single tuple id). This makes the predicates not being picked up correctly.

      Reproduce
      Create a column masking policy on functional_parquet.complextypestbl table: id => 100 * id. The following query has incorrect results:

      select id, nested_struct.a from functional_parquet.complextypestbl t
      where id = 100 or nested_struct.a = 1;
      +-----+-----------------+
      | id  | nested_struct.a |
      +-----+-----------------+
      | 100 | 1               |
      | 200 | NULL            |
      | 300 | NULL            |
      | 400 | NULL            |
      | 500 | NULL            |
      | 600 | NULL            |
      | 700 | 7               |
      | 800 | -1              |
      +-----+-----------------+
      

      Explaining the query shows the predicate are not assigned:

      Query: explain select id, nested_struct.a from functional_parquet.complextypestbl t
      where id = 100 or nested_struct.a = 1
      +---------------------------------------------------------------------------------------+
      | Explain String                                                                        |
      +---------------------------------------------------------------------------------------+
      | Max Per-Host Resource Reservation: Memory=16.00KB Threads=3                           |
      | Per-Host Resource Estimates: Memory=32MB                                              |
      | WARNING: The following tables are missing relevant table and/or column statistics.    |
      | functional_parquet.complextypestbl                                                    |
      | Analyzed query: SELECT id, nested_struct.a FROM (SELECT CAST(CAST(100 AS BIGINT)      |
      | * id AS BIGINT) id FROM functional_parquet.complextypestbl t) WHERE id =              |
      | CAST(100 AS BIGINT) OR nested_struct.a = CAST(1 AS INT)                               |
      |                                                                                       |
      | F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                                 |
      | Per-Host Resources: mem-estimate=57.78KB mem-reservation=0B thread-reservation=1      |
      |   PLAN-ROOT SINK                                                                      |
      |   |  output exprs: CAST(CAST(100 AS BIGINT) * id AS BIGINT), nested_struct.a          |
      |   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                          |
      |   |                                                                                   |
      |   01:EXCHANGE [UNPARTITIONED]                                                         |
      |      mem-estimate=57.78KB mem-reservation=0B thread-reservation=0                     |
      |      tuple-ids=0 row-size=12B cardinality=4.40K                                       |
      |      in pipelines: 00(GETNEXT)                                                        |
      |                                                                                       |
      | F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2                                        |
      | Per-Host Resources: mem-estimate=32.00MB mem-reservation=16.00KB thread-reservation=2 |
      |   DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, UNPARTITIONED]                          |
      |   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                          |
      |   00:SCAN HDFS [functional_parquet.complextypestbl t, RANDOM]                         |
      |      HDFS partitions=1/1 files=2 size=6.92KB                                          |
      |      stored statistics:                                                               |
      |        table: rows=unavailable size=unavailable                                       |
      |        columns missing stats: id                                                      |
      |      extrapolated-rows=disabled max-scan-range-rows=unavailable                       |
      |      mem-estimate=32.00MB mem-reservation=16.00KB thread-reservation=1                |
      |      tuple-ids=0 row-size=12B cardinality=4.40K                                       |
      |      in pipelines: 00(GETNEXT)                                                        |
      +---------------------------------------------------------------------------------------+
      

      Attachments

        Issue Links

          Activity

            People

              stigahuang Quanlong Huang
              stigahuang Quanlong Huang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: