Details
Description
To show an example of MulticlassClassificationEvaluator generating a numerical output, which does not coincide with the expected output consider the following code:
from pyspark.ml.classification import LinearSVC from pyspark.ml.feature import VectorAssembler from pyspark.ml.evaluation import MulticlassClassificationEvaluator train_data = [(0, 1.0, 2.0, 3.0), (1, 4.0, 5.0, 6.0), (0, 7.0, 8.0, 9.0)] valid_data = [(1, 2.0, 3.0, 4.0), (0, 5.0, 6.0, 7.0), (1, 8.0, 9.0, 10.0)] schema = ["label", "feature1", "feature2", "feature3"] train = spark.createDataFrame(train_data, schema=schema) valid = spark.createDataFrame(valid_data, schema=schema) feature_columns = ["feature1", "feature2", "feature3"] assembler = VectorAssembler(inputCols=feature_columns, outputCol="features") train = assembler.transform(train) valid = assembler.transform(valid) svm = LinearSVC(maxIter=10, regParam=0.1) model = svm.fit(train) predictions = model.transform(valid) recallByLabel = MulticlassClassificationEvaluator(metricName="recallByLabel") weightedRecall = MulticlassClassificationEvaluator(metricName="weightedRecall") print(f"Recall by label: {recallByLabel.evaluate(predictions)}") print(f"Weighted recall: {weightedRecall.evaluate(predictions)}")
It produces:
Recall by label: 1.0 Weighted recall: 0.3333333333333333
but predictions.show() implies the following hand calculated confusion matrix:
----------- | 0 | 0 | | 2 | 1 | -----------
where the recall is 0, i.e., 0 / (0 + 2).
What is the nature of this discrepancy? Also, note that it is not restricted to recall; and other classifiers, which include a probability column in predictions, behave similarly.
Furthermore, the translation of the example to Scala, namely:
import org.apache.spark.ml.classification.LinearSVC import org.apache.spark.ml.feature.VectorAssembler import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator import org.apache.spark.sql.DataFrame val trainData = Seq((0, 1.0, 2.0, 3.0), (1, 4.0, 5.0, 6.0), (0, 7.0, 8.0, 9.0)) val validData = Seq((1, 2.0, 3.0, 4.0), (0, 5.0, 6.0, 7.0), (1, 8.0, 9.0, 10.0)) val schema = Seq("label", "feature1", "feature2", "feature3") val train: DataFrame = spark.createDataFrame(trainData).toDF(schema: _*) val valid: DataFrame = spark.createDataFrame(validData).toDF(schema: _*) val featureColumns = Array("feature1", "feature2", "feature3") val assembler = new VectorAssembler() .setInputCols(featureColumns) .setOutputCol("features") val trainAssembled = assembler.transform(train) val validAssembled = assembler.transform(valid) val svm = new LinearSVC() .setMaxIter(10) .setRegParam(0.1) val model = svm.fit(trainAssembled) val predictions = model.transform(validAssembled) val recallByLabel = new MulticlassClassificationEvaluator() .setMetricName("recallByLabel") val weightedRecall = new MulticlassClassificationEvaluator() .setMetricName("weightedRecall") println(s"Recall by label: ${recallByLabel.evaluate(predictions)}") println(s"Weighted recall: ${weightedRecall.evaluate(predictions)}")
produces the same recall by label and weighted recall, as described above.