Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
2.4.3
-
None
-
None
Description
Hi, this error is affecting a bunch of our nested use cases.
Saving a PipelineModel with one of its stages being another PipelineModel, fails when loading it from Scala if it is saved in Python.
Python side:
from pyspark.ml import Pipeline from pyspark.ml.feature import Tokenizer t = Tokenizer() p = Pipeline().setStages([t]) d = spark.createDataFrame([["Hello Peter Parker"]]) pm = p.fit(d) np = Pipeline().setStages([pm]) npm = np.fit(d) npm.write().save('./npm_test')
Scala side:
scala> import org.apache.spark.ml.PipelineModel scala> val pp = PipelineModel.load("./npm_test") java.lang.IllegalArgumentException: requirement failed: Error loading metadata: Expected class name org.apache.spark.ml.PipelineModel but found class name pyspark.ml.pipeline.PipelineModel at scala.Predef$.require(Predef.scala:224) at org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638) at org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616) at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267) at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348) at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342) at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380) at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332) ... 50 elided