Description
I am interested in finding out if there is a bug in the implementation of RandomForests. The Issue is when applying a seed and getting different results than other people from my class when applying it to the same data
I am calculating the RMSE metric like this:
(trainingData, testData) = data.randomSplit([0.7, 0.3], 313) from pyspark.ml.regression import RandomForestRegressor rfr = RandomForestRegressor(labelCol="labels", featuresCol="features", maxDepth=5, numTrees=3, seed = 313) from pyspark.ml.evaluation import RegressionEvaluator evaluator = RegressionEvaluator\ (labelCol="labels", predictionCol="prediction", metricName="rmse") rmse = evaluator.evaluate(predictions) print("RMSE = %g " % rmse)
I am setting the seed. For seed = 50 and also for other seeds I get exact same RMSE as people from class. I set seed to 313 and it is giving me different value. What could be the issue here?