Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25035

Replicating disk-stored blocks should avoid memory mapping

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.1
    • 3.0.0
    • Spark Core

    Description

      This is a follow-up to SPARK-24296.

      When replicating a disk-cached block, even if we fetch-to-disk, we still memory-map the file, just to copy it to another location.

      Ideally we'd just move the tmp file to the right location. But even without that, we could read the file as an input stream, instead of memory-mapping the whole thing. Memory-mapping is particularly a problem when running under yarn, as the OS may believe there is plenty of memory available, meanwhile yarn decides to kill the process for exceeding memory limits.

      Attachments

        Issue Links

          Activity

            People

              attilapiros Attila Zsolt Piros
              irashid Imran Rashid
              Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: