[SPARK-27712] createDataFrame() reorders row - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: 2.4.0
Fix Version/s: None
Component/s: PySpark
Labels:
- correctness
Environment:

emr-5.20.0

PySpark 2.4.0

Python 2.7.15

Description

Executing the following:

my_schema = pyspark.sql.types.StructType([
    pyspark.sql.types.StructField("B", pyspark.sql.types.StringType(), True),
    pyspark.sql.types.StructField("A", pyspark.sql.types.StringType(), True)
])

spark.createDataFrame(spark.sparkContext.parallelize([pyspark.sql.Row(A="1", B="2")]), my_schema).collect()

should produce this:

[Row(A="1", B="2")]

or this:

[Row(B='2', A='1')]

but produces this instead:

[Row(B=u'1', A=u'2')]

Attachments

Issue Links

duplicates

SPARK-22232 Row objects in pyspark created using the `Row(**kwars)` syntax do not get serialized/deserialized properly

Resolved

relates to

SPARK-29748 Remove sorting of fields in PySpark SQL Row creation

Resolved

links to

GitHub Pull Request #24614

Activity

People

Assignee:: Unassigned

Reporter:: Tim Ludwinski

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 14/May/19 21:00

Updated:: 04/Nov/19 21:42

Resolved:: 17/May/19 21:46