Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6717

Clear shuffle files after checkpointing in ALS

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.4.0
    • 2.0.0
    • MLlib

    Description

      In ALS iterations, we checkpoint RDDs to cut lineage and to reduce shuffle files. However, whether to clean shuffle files depends on the system GC, which may not be triggered in ALS iterations. So after checkpointing, before we let the RDD object go out of scope, we should clean its shuffle dependencies explicitly. This function could either stay inside ALS or go to Core.

      Without this feature, we can call System.gc() periodically to clean shuffle files of RDDs that went out of scope.

      Attachments

        Issue Links

          Activity

            People

              holden Holden Karau
              mengxr Xiangrui Meng
              Nicholas Pentreath Nicholas Pentreath
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: