Since many functions are only implemented in spark.mllib, it is often necessary to convert DataFrames of spark.ml vectors to spark.mllib distributed matrix formats. The first step, converting the spark.ml vectors to the spark.mllib equivalent, is straightforward. However, to the best of my knowledge it's not possible to convert the resulting DataFrame to a RowMatrix without using a python lambda function, which can have a significant performance hit. In my recent use case, SVD took 3.5m using the Scala API, but 12m using Python.
To get around this performance hit, I propose adding a constructor to the Pyspark RowMatrix class that accepts a DataFrame with a single column of spark.mllib vectors. I'd be happy to add an equivalent API for IndexedRowMatrix if there is demand.