Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28726

Spark with DynamicAllocation always got connect rest by peers

    XMLWordPrintableJSON

    Details

    • Type: Wish
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.4.0
    • Fix Version/s: None
    • Component/s: Spark Core
    • Labels:
      None

      Description

      When use Spark with dynamic allocation, we set idle time to 5s

      We always got exception about neety 'Connect reset by peers'

       

      I suspect that it's because we set idle time 5s is too small, it will cause when Blockmanager call netty io, the executor has been remove because of timeout.

      But not timely notify driver's BlocakManager

      
      19/08/14 00:00:46 WARN org.apache.spark.network.server.TransportChannelHandler: "Exception in connection from /host:port"
      java.io.IOException: Connection reset by peer
       at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
       at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
       at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
       at sun.nio.ch.IOUtil.read(IOUtil.java:192)
       at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
       at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
       at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
       at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
       at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
       at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
       at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
       at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
       at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
       at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
       at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
      --
      19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMasterEndpoint: "Error trying to remove broadcast 67 from block manager BlockManagerId(967, host, port, None)"
      java.io.IOException: Connection reset by peer
       at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
       at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
       at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
       at sun.nio.ch.IOUtil.read(IOUtil.java:192)
       at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
       at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
       at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
       at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
       at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
       at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
       at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
       at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
       at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
       at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
       at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
      --
      19/08/14 00:00:46 INFO org.apache.spark.ContextCleaner: "Cleaned accumulator 162174"
      19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMaster: "Failed to remove shuffle 22 - Connection reset by peer"
      java.io.IOException: Connection reset by peer
       at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
       at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              angerszhuuu angerszhu
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: