Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Cannot Reproduce
-
2.3.1
-
None
Description
When I define a parameter in a UDF as Boolean or Int the result DataFrame can't be cachedĀ
import org.apache.spark.sql.functions.{lit, udf} val empty = sparkSession.emptyDataFrame val table = "table" def test(customUDF: UserDefinedFunction, col: Column): Unit = { val df = empty.select(customUDF(col)) df.cache() df.createOrReplaceTempView(table) println(sparkSession.catalog.isCached(table)) } test(udf { _: String => 42 }, lit("")) // true test(udf { _: Any => 42 }, lit("")) // true test(udf { _: Int => 42 }, lit(42)) // false test(udf { _: Boolean => 42 }, lit(false)) // false
orĀ sparkSession.catalog.isCached gives irrelevant information.