Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40920

SVD: matrix U has wrong row order

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.3.0
    • None
    • MLlib, PySpark
    • None
    • Python 3.10, multi-core machine, no cluster

    Description

      When performing SVD on a RowMatrix, the matrix U has the wrong row order and the original matrix is not correctly restored with the given matrix. 

       

      Consider the following code:

      x_np = np.random.random((14, 3)) # the size matters, it works for smaller sizes
      x = ctx.parallelize(x_np).zipWithIndex().map(
          lambda r: [MatrixEntry(r[1], i, r[0][i]) for i in range(len(r[0]))])
      x = CoordinateMatrix(x.flatMap(lambda x: x))
      x_inv = matrix_inverse(x) 

      with 

      def matrix_inverse(matrix: CoordinateMatrix) -> DenseMatrix:
          mtrx = matrix.toRowMatrix()
          svd = matrix.toRowMatrix().computeSVD(k=mtrx.numCols(), computeU=True, rCond=1e-15)  # do the SVD
      
          s_inv = 1 / svd.s
          mtrx_orig = matrix.toBlockMatrix().blocks.first()[1].toArray()
          u_dense = mtrx_orig @ (svd.V.toArray() * s_inv[np.newaxis, :])
          cov_inv = np.matmul(svd.V.toArray(), np.multiply(s_inv[:, np.newaxis], u_dense.T))
          u_from_spark = np.array(svd.U.rows.map(lambda x: x.toArray()).collect())
          return DenseMatrix(numRows=cov_inv.shape[0], numCols=cov_inv.shape[1],
                             values=cov_inv.ravel(order="F"))  # return inverse as dense matrix 

      Then, u_dense is the correct U but differs from the U produced by Spark. In particular, the U in Spark does not return the correct pseudoinverse and U@S@V.T does not reproduce the input matrix. 

       

      With the following input matrix x

      I get the following u_dense

      but the following u_from_spark

       

      On careful inspection, it seems that the row order is wrong.

       

      Attachments

        1. image-2022-10-26-13-59-13-425.png
          49 kB
          Leonard Papenmeier
        2. image-2022-10-26-13-59-04-608.png
          48 kB
          Leonard Papenmeier
        3. image-2022-10-26-13-58-52-998.png
          48 kB
          Leonard Papenmeier

        Activity

          People

            Unassigned Unassigned
            LeoIV Leonard Papenmeier
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: