[SPARK-29664] Column.getItem behavior is not consistent with Scala version - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.0.0
Component/s: PySpark
Labels:
None

Description

In PySpark, Column.getItem's behavior is different from the Scala version.

For example,
In PySpark:

df = spark.range(2)
map_col = create_map(lit(0), lit(100), lit(1), lit(200))
df.withColumn("mapped", map_col.getItem(col('id'))).show()
# +---+------+
# | id|mapped|
# +---+------+
# |  0|   100|
# |  1|   200|
# +---+------+

In Scala:

val df = spark.range(2)
val map_col = map(lit(0), lit(100), lit(1), lit(200))
// The following getItem results in the following exception, which is the right behavior:
// java.lang.RuntimeException: Unsupported literal type class org.apache.spark.sql.Column id
//  at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:78)
//  at org.apache.spark.sql.Column.getItem(Column.scala:856)
//  ... 49 elided
df.withColumn("mapped", map_col.getItem(col("id"))).show


// You have to use apply() to match with PySpark's behavior.
df.withColumn("mapped", map_col(col("id"))).show
// +---+------+
// | id|mapped|
// +---+------+
// |  0|   100|
// |  1|   200|
// +---+------+

Looking at the code for Scala implementation, PySpark's behavior is incorrect since the argument to getItem becomes `Literal`.

Attachments

Issue Links

links to

[Github] Pull Request #28327 (HyukjinKwon)

GitHub Pull Request #26351

Github Pull Request #28327

Activity

People

Assignee:: Terry Kim

Reporter:: Terry Kim

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 30/Oct/19 16:57

Updated:: 12/Dec/22 18:10

Resolved:: 01/Nov/19 03:26