Details
Description
When performing SVD on a RowMatrix, the matrix U has the wrong row order and the original matrix is not correctly restored with the given matrix.
Consider the following code:
x_np = np.random.random((14, 3)) # the size matters, it works for smaller sizes x = ctx.parallelize(x_np).zipWithIndex().map( lambda r: [MatrixEntry(r[1], i, r[0][i]) for i in range(len(r[0]))]) x = CoordinateMatrix(x.flatMap(lambda x: x)) x_inv = matrix_inverse(x)
with
def matrix_inverse(matrix: CoordinateMatrix) -> DenseMatrix: mtrx = matrix.toRowMatrix() svd = matrix.toRowMatrix().computeSVD(k=mtrx.numCols(), computeU=True, rCond=1e-15) # do the SVD s_inv = 1 / svd.s mtrx_orig = matrix.toBlockMatrix().blocks.first()[1].toArray() u_dense = mtrx_orig @ (svd.V.toArray() * s_inv[np.newaxis, :]) cov_inv = np.matmul(svd.V.toArray(), np.multiply(s_inv[:, np.newaxis], u_dense.T)) u_from_spark = np.array(svd.U.rows.map(lambda x: x.toArray()).collect()) return DenseMatrix(numRows=cov_inv.shape[0], numCols=cov_inv.shape[1], values=cov_inv.ravel(order="F")) # return inverse as dense matrix
Then, u_dense is the correct U but differs from the U produced by Spark. In particular, the U in Spark does not return the correct pseudoinverse and U@S@V.T does not reproduce the input matrix.
With the following input matrix x
I get the following u_dense
but the following u_from_spark
On careful inspection, it seems that the row order is wrong.