[SPARK-33661] Unable to load RandomForestClassificationModel trained in Spark 2.x - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 3.0.1
Fix Version/s: None
Component/s: ML
Labels:
None

Description

When attempting to load a RandomForestClassificationModel that was trained in Spark 2.x using Spark 3.x, an exception is raised:

...
    RandomForestClassificationModel.load('/path/to/my/model')
  File "/usr/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 330, in load
  File "/usr/spark/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 291, in load
  File "/usr/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 280, in load
  File "/usr/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
  File "/usr/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 134, in deco
  File "<string>", line 3, in raise_from
pyspark.sql.utils.AnalysisException: No such struct field rawCount in id, prediction, impurity, impurityStats, gain, leftChild, rightChild, split;

There seems to be a schema incompatibility between the trained model data saved by Spark 2.x and the expected data for a model trained in Spark 3.x

If this issue is not resolved, users will be forced to retrain any existing random forest models they trained in Spark 2.x using Spark 3.x before they can upgrade

Attachments

Issue Links

duplicates

SPARK-33398 AnalysisException when loading a PipelineModel with Spark 3

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Marcus Levine

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 04/Dec/20 16:38

Updated:: 12/Dec/22 18:10

Resolved:: 07/Jan/21 20:19