Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
2.4.0, 2.4.1, 2.4.2, 2.4.3
-
None
-
None
Description
Feature importance values obtained in a binary classification project outputs different values if 2.3.3 version used or 2.4.0. It happens in Random Forest and GBT. Turns out that values that are equal than sklearn output are from 2.3.3 version.
As an example:
SPARK 2.4
MODEL RandomForestClassifier_gini [0.0, 0.4117930839002269, 0.06894132653061226, 0.15857667209786705, 0.2974447311021076, 0.06324418636918638]
MODEL RandomForestClassifier_entropy [0.0, 0.3864372497988694, 0.06578883597468652, 0.17433924485055197, 0.31754597164210124, 0.055888697733790925]
MODEL GradientBoostingClassifier [0.0, 0.7555555555555556, 0.24444444444444438, 0.0, 1.4602196686471875e-17, 0.0]
SPARK 2.3.3
MODEL RandomForestClassifier_gini [0.0, 0.40957086167800455, 0.06894132653061226, 0.16413222765342259, 0.2974447311021076, 0.05991085303585305]
MODEL RandomForestClassifier_entropy [0.0, 0.3864372497988694, 0.06578883597468652, 0.18789704501922055, 0.30398817147343266, 0.055888697733790925]
MODEL GradientBoostingClassifier [0.0, 0.7555555555555555, 0.24444444444444438, 0.0, 2.4326753518951276e-17, 0.0]
Attachments
Issue Links
- duplicates
-
SPARK-26721 Bug in feature importance calculation in GBM (and possibly other decision tree classifiers)
- Resolved