Description
At the moment PySpark Column encodes JVM response in _repr_ method.
As a result, column names using only ASCII characters get b prefix
>>> from pyspark.sql.functions import col >>> col("abc") Column<b'abc'>
and the others ugly byte string
>>> col("wąż") Column<b'w\xc4\x85\xc5\xbc'>
This behaviour is inconsistent with other parts of the API, for example:
>>> spark.createDataFrame([], "`wąż` long")
DataFrame[wąż: bigint]
and Scala
scala> col("wąż")
res0: org.apache.spark.sql.Column = wąż
and R
> column("wąż")
Column wąż
Encoding has been originally introduced with SPARK-5859, but it doesn't seem like it is really required.
Desired behaviour
>>> col("wąż") Column<'wąż'>
or
>>> col("wąż")
Column<wąż>