Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-39702

Reduce memory overhead of TransportCipher$EncryptedMessage's byteRawChannel buffer

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.4.0
    • Spark Core, YARN
    • None

    Description

      With Spark's encryption enabled (spark.network.crypto.enabled set to true and spark.network.crypto.saslFallback set to false), I ran into memory usage problems in the external shuffle service.

      This was caused by a problem that is very similar to SPARK-24801: each TransportCipher$EncryptedMessage eagerly initializes a buffer that is used during the encryption process. This buffer is only used once transferTo is called, but it is eagerly initialized in the EncryptedMessage constructor. This leads to high memory usage when there are many messages queued in an outgoing channel.

      One possible fix would be to mimic SPARK-24801 and make the initialization lazy. However, we can actually go one step further and share a single re-used buffer across multiple messages. This is safe because those messages are already sharing a different buffer which is accessed in the same write paths.

      Attachments

        Issue Links

          Activity

            People

              joshrosen Josh Rosen
              joshrosen Josh Rosen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: