Description
In Pyspark, when an RDD of LabeledPoints is converted to a DataFrame using toDF(), the returned DataFrame has type "null" instead of VectorUDT.
To reproduce:
from pyspark.mllib.util import MLUtils rdd = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt") df = rdd.toDF()
Examine rdd and df to see:
>>> df DataFrame[features: null, label: double]
Attachments
Issue Links
- links to