Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-4447

Zeppelin notebook fails to run spark jobs (org.apache.spark.shuffle.FetchFailedException * Failed to open file)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.7.3
    • None
    • Interpreters, pySpark
    • None

    Description

      I'm running zeppelin on a Kubernetes cluster (v1.13.4) and the spark interpreter was running smoothly until I installed and used pandas and seaborn for pyspark (on the zeppelin image and spark executor image). The executor pods have 5 cores and 1 G memory each. 

      // Sample zeppelin paragraph
      %pyspark
      import seaborn as sns
      #Plot success rate by age bin
      x = df_age.groupby("*").agg({"*":"mean"}).sort("*").toPandas()sns.barplot("*","avg(*)", data = x, color = "cadetblue")
      

      I tried looking for the directory that is shown in the error and discovered that only 1/3 executor pods has the indicated directory. There are times when the paragraphs I made run perfectly. 

      Side note: I tried increasing the executor memory to 5G but it didn't reflect with the spawned pods.

      // Spark-submit configs
      
      --conf spark.kubernetes.authenticate.driver.serviceAccountName=*
      --conf spark.executor.cores=5
      --conf spark.executor.memory=5g
      --conf spark.driver.memory=5g
      --conf spark.kubernetes.driver.docker.image=*
      --conf spark.kubernetes.executor.docker.image=*
      --conf spark.local.dir=/tmp/spark-local
      --conf spark.executor.instances=3    
      --conf spark.dynamicAllocation.enabled=true
      --conf spark.shuffle.service.enabled=true    
      --conf spark.kubernetes.shuffle.labels="*"    
      --conf spark.dynamicAllocation.maxExecutors=3   
      --conf spark.dynamicAllocation.minExecutors=1    
      --conf spark.kubernetes.shuffle.namespace=*    
      --conf spark.kubernetes.docker.image.pullPolicy=IfNotPresent    
      --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.2.0-kubernetes-0.5.0    
      --conf spark.kubernetes.resourceStagingServer.uri=http://*

       

      Error logs

      FetchFailed(BlockManagerId(1, 192.168.3.148, 7337, None), shuffleId=39, mapId=0, reduceId=100, message=

      org.apache.spark.shuffle.FetchFailedException: Failure while fetching StreamChunkId{streamId=1309473202020, chunkIndex=0}: java.lang.RuntimeException: Failed to open file: /tmp/spark-local/blockmgr-042e15ea-f4cd-4c8e-b5c1-0cc147f2c68f/0e/shuffle_39_0_0.index at

      org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:249)

      at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getBlockData(ExternalShuffleBlockResolver.java:174)

      at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler$1.next(ExternalShuffleBlockHandler.java:105)

      at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler$1.next(ExternalShuffleBlockHandler.java:95)

      at org.apache.spark.network.server.OneForOneStreamManager.getChunk(OneForOneStreamManager.java:89)

      at org.apache.spark.network.server.TransportRequestHandler.processFetchRequest(TransportRequestHandler.java:125)

      at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:103)

      at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

      at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

      at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

      at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

      at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

      at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)

      at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)

      at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)

      at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)

      at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)

      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)

      at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)

      at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)

      at java.lang.Thread.run(Thread.java:748)

      Caused by: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: /tmp/spark-local/blockmgr-042e15ea-f4cd-4c8e-b5c1-0cc147f2c68f/0e/shuffle_39_0_0.index (No such file or directory)

      at org.spark_project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)

      at org.spark_project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)

      at org.spark_project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)

      at org.spark_project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)

      at org.spark_project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)

      at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)

      at org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)

      at org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)

      at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)

      at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)

      at org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)

      at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:240)

      ... 34 more

      Caused by: java.io.FileNotFoundException: /tmp/spark-local/blockmgr-042e15ea-f4cd-4c8e-b5c1-0cc147f2c68f/0e/shuffle_39_0_0.index (No such file or directory)

      at java.io.FileInputStream.open0(Native Method)

      at java.io.FileInputStream.open(FileInputStream.java:195)

      at java.io.FileInputStream.<init>(FileInputStream.java:138)

      at org.apache.spark.network.shuffle.ShuffleIndexInformation.<init>(ShuffleIndexInformation.java:41)

      at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver$1.load(ExternalShuffleBlockResolver.java:111)

      at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver$1.load(ExternalShuffleBlockResolver.java:109)

      at org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)

      at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)

      ... 40 more

       

      at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:442)

      at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:418)

      at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:59)

      at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)

      at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)

      at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)

      at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)

      at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)

      at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)

      at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown Source)

      at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)

      at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)

      at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)

      at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)

      at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)

      at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)

      at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)

      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)

      at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)

      at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)

      at org.apache.spark.scheduler.Task.run(Task.scala:108)

      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)

      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

      at java.lang.Thread.run(Thread.java:748)

      Caused by: org.apache.spark.network.client.ChunkFetchFailureException: Failure while fetching StreamChunkId{streamId=1309473202020, chunkIndex=0}: java.lang.RuntimeException: Failed to open file: /tmp/spark-local/blockmgr-042e15ea-f4cd-4c8e-b5c1-0cc147f2c68f/0e/shuffle_39_0_0.index

      at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:249)

      at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getBlockData(ExternalShuffleBlockResolver.java:174)

      at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler$1.next(ExternalShuffleBlockHandler.java:105)

      at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler$1.next(ExternalShuffleBlockHandler.java:95)

      at org.apache.spark.network.server.OneForOneStreamManager.getChunk(OneForOneStreamManager.java:89)

      at org.apache.spark.network.server.TransportRequestHandler.processFetchRequest(TransportRequestHandler.java:125)

      at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:103)

      at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

      at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

      at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

      at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

      at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

      at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)

      at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)

      at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)

      at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)

      at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)

      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)

      at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)

      at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)

      at java.lang.Thread.run(Thread.java:748)

      Caused by: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: /tmp/spark-local/blockmgr-042e15ea-f4cd-4c8e-b5c1-0cc147f2c68f/0e/shuffle_39_0_0.index (No such file or directory)

      at org.spark_project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)

      at org.spark_project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)

      at org.spark_project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)

      at org.spark_project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)

      at org.spark_project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)

      at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)

      at org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)

      at org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)

      at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)

      at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)

      at org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)

      at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:240)

      ... 34 more

      Caused by: java.io.FileNotFoundException: /tmp/spark-local/blockmgr-042e15ea-f4cd-4c8e-b5c1-0cc147f2c68f/0e/shuffle_39_0_0.index (No such file or directory)

      at java.io.FileInputStream.open0(Native Method)

      at java.io.FileInputStream.open(FileInputStream.java:195)

      at java.io.FileInputStream.<init>(FileInputStream.java:138)

      at org.apache.spark.network.shuffle.ShuffleIndexInformation.<init>(ShuffleIndexInformation.java:41)

      at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver$1.load(ExternalShuffleBlockResolver.java:111)

      at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver$1.load(ExternalShuffleBlockResolver.java:109)

      at org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)

      at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)

      ... 40 more

       

      at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:182)

      at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:120)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

      at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

      at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

      at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)

      at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)

      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)

      at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)

      at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)

      at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)

      at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)

      at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)

      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)

      at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)

      at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)

      ... 1 more

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            joshuav11 Joshua Villanueva
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: