[SPARK-33415] Column.__repr__ shouldn't encode JVM response - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 3.1.0
Fix Version/s: None
Component/s: PySpark, SQL
Labels:
None

Target Version/s:

3.1.0

Description

At the moment PySpark Column encodes JVM response in _repr_ method.

As a result, column names using only ASCII characters get b prefix

>>> from pyspark.sql.functions import col                                                                                                                                                
>>> col("abc")                                                                                                                                                                           
Column<b'abc'>

and the others ugly byte string

>>> col("wąż")                                                                                                                                                                           
Column<b'w\xc4\x85\xc5\xbc'>

This behaviour is inconsistent with other parts of the API, for example:

>>> spark.createDataFrame([], "`wąż` long")                                                                                                                                              
DataFrame[wąż: bigint]

and Scala

scala> col("wąż")
res0: org.apache.spark.sql.Column = wąż

and R

> column("wąż")
Column wąż

Encoding has been originally introduced with ~~SPARK-5859~~, but it doesn't seem like it is really required.

Desired behaviour

>>> col("wąż")                                                                                                                                                                           
Column<'wąż'>

>>> col("wąż")                                                                                                                                                                           
Column<wąż>

Attachments

Issue Links

links to

[Github] Pull Request #30322 (zero323)

Activity

People

Assignee:: Maciej Szymkiewicz

Reporter:: Maciej Szymkiewicz

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 10/Nov/20 23:04

Updated:: 12/Dec/22 18:10

Resolved:: 11/Nov/20 15:14