Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42825

setParams() only sets explicitly named params. Is this intentional or a bug?

    XMLWordPrintableJSON

Details

    • Question
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.3.2
    • None
    • ML, PySpark
    • None

    Description

      The Python signature/docstring of the setParams() method for the estimators and transformers under pyspark.ml imply that if you don't set any of the named params then they will be reset to their default values.

      Example from https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.clustering.GaussianMixture.html#pyspark.ml.clustering.GaussianMixture.setParams :

      setParams(self, \*, featuresCol="features", predictionCol="prediction", k=2, probabilityCol="probability", tol=0.01, maxIter=100, seed=None, aggregationDepth=2, weightCol=None)

      In the extreme this would imply that if you called setParams() with no args then all the params would be reset to their default values.

      But what actually happens is that only the params passed in the call get changed; the values of any other params aren't affected. So if you call setParams() with no args then no params get changed!

      So is this behavior by design? I guess it is from the name of the method. But it is counter-intuitive from its docstring. So if this behavior is intentional then perhaps the default docstring should make this explicit by saying something like:

      "Sets the named params. The values of other params are not affected."

      Attachments

        Activity

          People

            Unassigned Unassigned
            lucas.partridge Lucas Partridge
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: