Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22625

Properly cleanup inheritable thread-locals

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.2.0
    • None
    • Spark Core

    Description

      Memory leak is present due to inherited thread locals, SPARK-20558 didn't fixed it properly.

      Our production application has the following logic: one thread is reading from HDFS and another one creates spark context, processes HDFS files and then closes it on regular schedule.

      Depending on what thread started first, SparkContext thread local may or may not be inherited by HDFS-daemon (DataStreamer), causing memory leak when streamer was created after spark context. Memory consumption increases every time new spark context is created, related yourkit paths: https://screencast.com/t/tgFBYMEpW
      The problem is more general and is not related to HDFS in particular.

      Proper fix: register all cloned properties (in `localProperties#childValue`) in ConcurrentHashMap and forcefully clear all of them in `SparkContext#close`

      Attachments

        Activity

          People

            Unassigned Unassigned
            qwwdfsad Tolstopyatov Vsevolod
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: