Description
Compare Spark computeCovariance function in RowMatrix for DenseVector and Numpy's function cov,
Find two problem, below is the result:
1)The Spark function computeCovariance in RowMatrix is not accuracy
input data
1.0,2.0,3.0,4.0,5.0
2.0,3.0,1.0,2.0,6.0
Numpy function cov result:
[[2.5 1.75]
[ 1.75 3.7 ]]
RowMatrix function computeCovariance result:
2.5 1.75
1.75 3.700000000000001
2)For some input case, the result is not good
generate input data by below logic
data1 = np.random.normal(loc=100000, scale=0.000009, size=10000000)
data2 = np.random.normal(loc=200000, scale=0.000002,size=10000000)
Numpy function cov result:
[[ 8.10536442e-11 -4.35439574e-15]
[ -4.35439574e-15 3.99928264e-12]]
RowMatrix function computeCovariance result:
-0.0027484893798828125 0.001491546630859375
0.001491546630859375 8.087158203125E-4