Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23318

FP-growth: WARN FPGrowth: Input data is not cached

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.2.1
    • 2.4.0
    • ML

    Description

      When running FPGrowth.fit() fromĀ ml package, one can see a warning:

      WARN FPGrowth: Input data is not cached.

      This warning occurs even the dataset of transactions is cached.

      Actually this warning comes from the FPGrowth implementation in old mllib package. New FPGrowth performs some transformations on the input data set of transactions and then passes it to the old FPGrowth - without caching. Hence the warning.

      The problem looks similar to SPARK-18356
      If you don't mind, I can push a similar fix:

      // ml.FPGrowth
      val handlePersistence = dataset.storageLevel == StorageLevel.NONE
      if (handlePersistence) {
        // cache the data
      }
      // then call mllib.FPGrowth
      // finally unpersist the data
      

      Attachments

        Activity

          People

            tashoyan Arseniy Tashoyan
            tashoyan Arseniy Tashoyan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified