[SPARK-16954] UDFs should allow output type to be specified in terms of the input type - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: None
Fix Version/s: None
Component/s: SQL
Labels:
- SQL
- UDF
- bulk-closed
- types

Description

Consider an array_compact UDF that removes null values from an array. There is no easy way to implement this UDF because an explicit return type with a TypeTag is required by code generation.

The interesting observation here is that the output type of `array_compact` is the same as its input type. In general, there is a broad class of UDFs, especially collection-oriented ones, whose output types are functions of the input types. In our Spark work we have found collection manipulation UDFs to be very powerful for cleaning up data and substantially improving performance, in particular, avoiding explode followed by groupBy. It would be nice if Spark made adding these types of UDFs very easy.

I won't go into possible ways to implement this under the covers as there are many options but I do want to point out that it is possible to communicate the right type information to Spark without changing the signature for UDF registration using placeholder types, e.g.,

sealed trait UDFArgumentAtPosition
case class ArgPos1 extends UDFArgumentAtPosition
case class ArgPos2 extends UDFArgumentAtPosition
// ...

case class Struct[ArgPos <: UDFArgumentAtPosition](value: Row)
case class ArrayElement[ArgPos <: UDFArgumentAtPosition, A : TypeTag](value: A)
case class MapKey[ArgPos <: UDFArgumentAtPosition, A : TypeTag](value: A)
case class MapValue[ArgPos <: UDFArgumentAtPosition, A : TypeTag](value: A)

// Functions are stubbed just to show compilation succeeds
def arrayCompact[A : TypeTag](xs: Seq[A]): ArgPos1 = null
def arraySum[A : Numeric : TypeTag](xs: Seq[A]): ArrayElement[ArgPos1, A] = ArrayElement(implicitly[Numeric[A]].zero)

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Simeon Simeonov

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 08/Aug/16 17:30

Updated:: 21/May/19 04:34

Resolved:: 21/May/19 04:34