Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33661

Unable to load RandomForestClassificationModel trained in Spark 2.x

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 3.0.1
    • None
    • ML
    • None

    Description

      When attempting to load a RandomForestClassificationModel that was trained in Spark 2.x using Spark 3.x, an exception is raised:

      ...
          RandomForestClassificationModel.load('/path/to/my/model')
        File "/usr/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 330, in load
        File "/usr/spark/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 291, in load
        File "/usr/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 280, in load
        File "/usr/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
        File "/usr/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 134, in deco
        File "<string>", line 3, in raise_from
      pyspark.sql.utils.AnalysisException: No such struct field rawCount in id, prediction, impurity, impurityStats, gain, leftChild, rightChild, split;
      

      There seems to be a schema incompatibility between the trained model data saved by Spark 2.x and the expected data for a model trained in Spark 3.x

      If this issue is not resolved, users will be forced to retrain any existing random forest models they trained in Spark 2.x using Spark 3.x before they can upgrade

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              marcusian Marcus Levine
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: