Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.3.1
-
None
Description
A dataset can store an array of vectors but not a vector. This is inconsistent.
To reproduce:
{ import org.apache.spark.sql.Row import org.apache.spark.ml.linalg.\{Vectors, DenseVector, Vector} import org.apache.spark.ml.linalg.SQLDataTypes.VectorType
import org.apache.spark.sql.types._
import spark.implicits._
val rdd = sc.parallelize(Seq(Row(Seq(Vectors.dense(Array(1.0, 2.0)).toSparse))))
val arrayOfVectorsDS = spark.createDataFrame(rowRDD= rdd, schema = new StructType(Array(StructField(name = "value", dataType = ArrayType(elementType = VectorType))))).as[Seq[Vector]]
// val vectorsDS = arrayOfVectorsDS.flatMap(a => a)
.show
}
If the line before ".show" is uncommented this code will throw the well known error: error: Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.