Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26084

AggregateExpression.references fails on unresolved expression trees

    XMLWordPrintableJSON

Details

    Description

      SPARK-18394 introduced a stable ordering in AttributeSet.toSeq using expression IDs (PR-18959) without noticing that AggregateExpression.references used AttributeSet.toSeq as a shortcut (link). The net result is that AggregateExpression.references fails for unresolved aggregate functions.

      org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression(
        org.apache.spark.sql.catalyst.expressions.aggregate.Sum(('x + 'y).expr),
        mode = org.apache.spark.sql.catalyst.expressions.aggregate.Complete,
        isDistinct = false
      ).references
      

      fails with

      org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to exprId on unresolved object, tree: 'y
      	at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.exprId(unresolved.scala:104)
      	at org.apache.spark.sql.catalyst.expressions.AttributeSet$$anonfun$toSeq$2.apply(AttributeSet.scala:128)
      	at org.apache.spark.sql.catalyst.expressions.AttributeSet$$anonfun$toSeq$2.apply(AttributeSet.scala:128)
      	at scala.math.Ordering$$anon$5.compare(Ordering.scala:122)
      	at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
      	at java.util.TimSort.sort(TimSort.java:220)
      	at java.util.Arrays.sort(Arrays.java:1438)
      	at scala.collection.SeqLike$class.sorted(SeqLike.scala:648)
      	at scala.collection.AbstractSeq.sorted(Seq.scala:41)
      	at scala.collection.SeqLike$class.sortBy(SeqLike.scala:623)
      	at scala.collection.AbstractSeq.sortBy(Seq.scala:41)
      	at org.apache.spark.sql.catalyst.expressions.AttributeSet.toSeq(AttributeSet.scala:128)
      	at org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression.references(interfaces.scala:201)
      

      The solution is to avoid calling toSeq as ordering is not important in references and simplify (and speed up) the implementation to something like

      mode match {
        case Partial | Complete => aggregateFunction.references
        case PartialMerge | Final => AttributeSet(aggregateFunction.aggBufferAttributes)
      }
      

      Attachments

        Activity

          People

            simeons Simeon Simeonov
            simeons Simeon Simeonov
            Votes:
            4 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: