Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-44848

MLlib GBTClassifier has wrong impurity method 'variance' instead of 'gini' or 'entropy'.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.4.1
    • None
    • MLlib
    • None

    Description

      Impurity method 'variance' should only be used for regressors, not classifiers. For classifiers gini and entropy should be available as it is already the case for the RandomForestClassifier https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.ml.classification.RandomForestClassifier.html .

      Because of this bug 'minInfoGain' hyperparameter cannot be tuned to combat overfitting. 

      Attachments

        Activity

          People

            Unassigned Unassigned
            lisi Elisabeth Niederbacher
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: