Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6625

Skip dictionary and collection conjunct assignment for non-Parquet scans.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.9.0, Impala 2.10.0, Impala 2.11.0
    • Impala 2.13.0, Impala 3.1.0
    • Frontend
    • ghx-label-5

    Description

      In HdfsScanNode.init() we try to assign dictionary and collection conjuncts even for non-Parquet scans. Such predicates only make sense for Parquet scans, so there is no point in collecting them for other scans.

      The current behavior is undesirable because:

      • init() can be substantially slower because assigning dictionary filters may involve evaluating exprs in the BE which can be expensive
      • the explain plan of non-Parquet scans may have a section "parquet dictionary predicates" which is confusing/misleading

      Relevant code snippet from HdfsScanNode:

      @Override
        public void init(Analyzer analyzer) throws ImpalaException {
          conjuncts_ = orderConjunctsByCost(conjuncts_);
          checkForSupportedFileFormats();
      
          assignCollectionConjuncts(analyzer);
          computeDictionaryFilterConjuncts(analyzer);
      
          // compute scan range locations with optional sampling
          Set<HdfsFileFormat> fileFormats = computeScanRangeLocations(analyzer);
      ...
          if (fileFormats.contains(HdfsFileFormat.PARQUET)) { <--- assignment should go in here
            computeMinMaxTupleAndConjuncts(analyzer);
          }
      ...
      }
      

      Attachments

        Issue Links

          Activity

            People

              poojanilangekar Pooja Nilangekar
              alex.behm Alexander Behm
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: