Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25544

Slow/failed convergence in Spark ML models due to internal predictor scaling

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.3.2
    • None
    • ML
    • Databricks runtime 4.2: Spark 2.3.1, Scala 2.11

    Description

      The LinearRegression and LogisticRegression estimators in Spark ML can take a large number of iterations to converge, or fail to converge altogether, when trained using the l-bfgs method with standardization turned off.

      Details:

      LinearRegression and LogisticRegression standardize their input features by default. In SPARK-8522 the option to disable standardization was added. This is implemented internally by changing the effective strength of regularization rather than disabling the feature scaling. Mathematically, both changing the effective regularizaiton strength, and disabling feature scaling should give the same solution, but they can have very different convergence properties.

      The normal justication given for scaling features is that it ensures that all covariances are O(1) and should improve numerical convergence, but this argument does not account for the regularization term. This doesn't cause any issues if standardization is set to true, since all features will have an O(1) regularization strength. But it does cause issues when standardization is set to false, since the effecive regularization strength of feature i is now O(1/ sigma_i^2) where sigma_i is the standard deviation of the feature. This means that predictors with small standard deviations (which can occur legitimately e.g. via one hot encoding) will have very large effective regularization strengths and consequently lead to very large gradients and thus poor convergence in the solver.

      Example code to recreate:

      To demonstrate just how bad these convergence issues can be, here is a very simple test case which builds a linear regression model with a categorical feature, a numerical feature and their interaction. When fed the specified training data, this model will fail to converge before it hits the maximum iteration limit. In this case, it is the interaction between category "2" and the numeric feature that leads to a feature with a small standard deviation.

      Training data:

      category numericFeature label
      1 1.0 0.5
      1 0.5 1.0
      2 0.01 2.0

       

      val df = Seq(("1", 1.0, 0.5), ("1", 0.5, 1.0), ("2", 1e-2, 2.0)).toDF("category", "numericFeature", "label")
      
      val indexer = new StringIndexer().setInputCol("category") .setOutputCol("categoryIndex")
      val encoder = new OneHotEncoder().setInputCol("categoryIndex").setOutputCol("categoryEncoded").setDropLast(false)
      val interaction = new Interaction().setInputCols(Array("categoryEncoded", "numericFeature")).setOutputCol("interaction")
      val assembler = new VectorAssembler().setInputCols(Array("categoryEncoded", "interaction")).setOutputCol("features")
      val model = new LinearRegression().setFeaturesCol("features").setLabelCol("label").setPredictionCol("prediction").setStandardization(false).setSolver("l-bfgs").setRegParam(1.0).setMaxIter(100)
      val pipeline = new Pipeline().setStages(Array(indexer, encoder, interaction, assembler, model))
      
      val pipelineModel  = pipeline.fit(df)
      
      val numIterations = pipelineModel.stages(4).asInstanceOf[LinearRegressionModel].summary.totalIterations

       Possible fix:

      These convergence issues can be fixed by turning off feature scaling when standardization is set to false rather than using an effective regularization strength. This can be hacked into LinearRegression.scala by simply replacing line 423

      val featuresStd = featuresSummarizer.variance.toArray.map(math.sqrt)
      

      with

      val featuresStd = if ($(standardization)) featuresSummarizer.variance.toArray.map(math.sqrt) else featuresSummarizer.variance.toArray.map(x => 1.0)
      

      Rerunning the above test code with that hack in place, will lead to convergence after just 4 iterations instead of hitting the max iterations limit!

      Impact:

      I can't speak for other people, but I've personally encountered these convergence issues several times when building production scale Spark ML models, and have resorted to writing my only implementation of LinearRegression with the above hack in place. The issue is made worse by the fact that Spark does not raise an error when the maximum number of iterations is hit, so the first time you encounter the issue it can take a while to figure out what is going on.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            Andrew-C Andrew Crosby
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: