[SPARK-6121] Python DataFrame type inference for LabeledPoint gets wrong type - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.3.0
Fix Version/s: 1.3.0
Component/s: MLlib, PySpark, SQL
Labels:
None

Target Version/s:

1.3.0

Description

In Pyspark, when an RDD of LabeledPoints is converted to a DataFrame using toDF(), the returned DataFrame has type "null" instead of VectorUDT.

To reproduce:

from pyspark.mllib.util import MLUtils
rdd = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")
df = rdd.toDF()

Examine rdd and df to see:

>>> df
DataFrame[features: null, label: double]

Attachments

Issue Links

links to

[Github] Pull Request #4858 (mengxr)

Activity

People

Assignee:: Xiangrui Meng

Reporter:: Joseph K. Bradley

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 02/Mar/15 22:35

Updated:: 03/Mar/15 01:14

Resolved:: 03/Mar/15 01:14