Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26728

Make rdd.unpersist blocking configurable

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 2.1.0, 2.4.0
    • None
    • Spark Core
    • None

    Description

      Currently, rdd.unpersist's blocking argument is set to true by default. However, in actual production cluster(especially large cluster), node lost or network issue can always happen.

      Users always use rdd.unpersist as non-exceptional, so sometimes the blocking unpersist may cause user's job failure, and this happened many times in our cluster.

      2018-05-16,13:28:33,489 WARN org.apache.spark.storage.BlockManagerMaster: Failed to remove RDD 15 - Failed to send RPC 7571440800577648876 to c3-hadoop-prc-st2325.bj/10.136.136.25:43474: java.nio.channels.ClosedChannelException
      java.io.IOException: Failed to send RPC 7571440800577648876 to c3-hadoop-prc-st2325.bj/10.136.136.25:43474: java.nio.channels.ClosedChannelException
      	at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
      	at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
      	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
      	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
      	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
      	at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801)
      	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699)
      	at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122)
      	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
      	at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
      	at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908)
      	at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960)
      	at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893)
      	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
      	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
      	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.nio.channels.ClosedChannelException
      2018-05-16,13:28:33,489 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: User class threw exception: java.io.IOException: Failed to send RPC 7571440800577648876 to c3-hadoop-prc-st2325.bj/10.136.136.25:43474: java.nio.channels.ClosedChannelException
      java.io.IOException: Failed to send RPC 7571440800577648876 to c3-hadoop-prc-st2325.bj/10.136.136.25:43474: java.nio.channels.ClosedChannelException
      	at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
      	at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
      	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
      	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
      	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
      	at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801)
      	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699)
      	at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122)
      	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
      	at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
      	at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908)
      	at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960)
      	at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893)
      	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
      	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
      	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.nio.channels.ClosedChannelException
      

      I think we can make this blocking argument as a config, so that we can control the default value of it with gray scale systems.

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              liupengcheng liupengcheng
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: