Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29832

Unnecessary persist on instances in ml.regression.IsotonicRegression.fit

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • ML
    • None

    Description

      Persist on instances in ml.regression.IsotonicRegression.fit() is unnecessary, because it is only used once in run(instances).

        override def fit(dataset: Dataset[_]): IsotonicRegressionModel = instrumented { instr =>
          transformSchema(dataset.schema, logging = true)
          // Extract columns from data.  If dataset is persisted, do not persist oldDataset.
          val instances = extractWeightedLabeledPoints(dataset)
          val handlePersistence = dataset.storageLevel == StorageLevel.NONE
          // Unnecessary persist
          if (handlePersistence) instances.persist(StorageLevel.MEMORY_AND_DISK)
          instr.logPipelineStage(this)
          instr.logDataset(dataset)
          instr.logParams(this, labelCol, featuresCol, weightCol, predictionCol, featureIndex, isotonic)
          instr.logNumFeatures(1)
          val isotonicRegression = new MLlibIsotonicRegression().setIsotonic($(isotonic))
          val oldModel = isotonicRegression.run(instances) // Only use once here
          if (handlePersistence) instances.unpersist()
      

      This issue is reported by our tool CacheCheck, which is used to dynamically detecting persist()/unpersist() api misuses.

      Attachments

        Activity

          People

            Unassigned Unassigned
            spark_cachecheck IcySanwitch
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: