Details
Description
If a PySpark user wants to convert MLlib sparse/dense vectors in a DataFrame into dense arrays, an efficient approach is to do that in JVM. However, it requires PySpark user to write Scala code and register it as a UDF. Often this is infeasible for a pure python project.
What we can do is to predefine those converters in Scala and expose them in PySpark, e.g.:
from pyspark.ml.functions import vector_to_dense_array df.select(vector_to_dense_array(col("features"))
cc: weichenxu123
Attachments
Issue Links
- links to