Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6121

Python DataFrame type inference for LabeledPoint gets wrong type

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.3.0
    • 1.3.0
    • MLlib, PySpark, SQL
    • None

    Description

      In Pyspark, when an RDD of LabeledPoints is converted to a DataFrame using toDF(), the returned DataFrame has type "null" instead of VectorUDT.

      To reproduce:

      from pyspark.mllib.util import MLUtils
      rdd = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")
      df = rdd.toDF()
      

      Examine rdd and df to see:

      >>> df
      DataFrame[features: null, label: double]
      

      Attachments

        Activity

          People

            mengxr Xiangrui Meng
            josephkb Joseph K. Bradley
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: