XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.4.3
    • None
    • MLlib
    • None

    Description

      The RDD dataset is used by more than two actions in learnVocab(dataset) and doFit. It needs to be persisted.

      def fit[S <: Iterable[String]](dataset: RDD[S]): Word2VecModel = {
          // Needs to persist dataset here
          learnVocab(dataset) // has action on dataset
          createBinaryTree()
          val sc = dataset.context
          val expTable = sc.broadcast(createExpTable())
          val bcVocab = sc.broadcast(vocab)
          val bcVocabHash = sc.broadcast(vocabHash)
          try {
            doFit(dataset, sc, expTable, bcVocab, bcVocabHash) // has action on dataset
      

      This issue is reported by our tool CacheCheck, which is used to dynamically detecting persist()/unpersist() api misuses.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              spark_cachecheck IcySanwitch
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: