Description
PySpark createDataFrame currently supports creating a spark DataFrame from Pandas, using Arrow if enabled. This could be extended to accept a pyarrow.Table which has the added benefit of being able to efficiently use columns with nested struct types.
It is possible to convert a pyarrow.Table with nested columns into a pandas.DataFrame, but the data becomes dictionaries, and is not a performant way to parallelize the data.
Time/Date columns would need to be handled specially, since pyspark currently uses pandas to convert Arrow data of these types to the required Spark internal format.
This follows from a mailing list discussion at http://apache-spark-user-list.1001560.n3.nabble.com/question-about-pyarrow-Table-to-pyspark-DataFrame-conversion-td36110.html