Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
2.4.0
-
None
-
emr-5.20.0
PySpark 2.4.0
Python 2.7.15
Description
Executing the following:
my_schema = pyspark.sql.types.StructType([ pyspark.sql.types.StructField("B", pyspark.sql.types.StringType(), True), pyspark.sql.types.StructField("A", pyspark.sql.types.StringType(), True) ]) spark.createDataFrame(spark.sparkContext.parallelize([pyspark.sql.Row(A="1", B="2")]), my_schema).collect()
should produce this:
[Row(A="1", B="2")]
or this:
[Row(B='2', A='1')]
but produces this instead:
[Row(B=u'1', A=u'2')]
Attachments
Issue Links
- duplicates
-
SPARK-22232 Row objects in pyspark created using the `Row(**kwars)` syntax do not get serialized/deserialized properly
- Resolved
- relates to
-
SPARK-29748 Remove sorting of fields in PySpark SQL Row creation
- Resolved
- links to