Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3169

kudu java client throws scanner expired error while processing large scan on High-load cluster

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.8.0, 1.9.0, 1.10.0, 1.10.1, 1.11.0, 1.12.0, 1.11.1
    • None
    • client, java

    Description

      user submits a spark task to scan  a kudu table with large amount records, after just few minutes the job failed after 4 attempts, each attempt failed with error :

       org.apache.kudu.client.NonRecoverableException: Scanner 4e34e6f821be42b889022ec681e235cc not found (it may have expired) org.apache.kudu.client.NonRecoverableException: Scanner 4e34e6f821be42b889022ec681e235cc not found (it may have expired) at org.apache.kudu.client.KuduException.transformException(KuduException.java:110) at org.apache.kudu.client.KuduClient.joinAndHandleException(KuduClient.java:402) at org.apache.kudu.client.KuduScanner.nextRows(KuduScanner.java:57) at org.apache.kudu.spark.kudu.RowIterator.hasNext(KuduRDD.scala:153) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:187) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Suppressed: org.apache.kudu.client.KuduException$OriginalException: Original asynchronous stack trace at org.apache.kudu.client.RpcProxy.dispatchTSError(RpcProxy.java:341) at org.apache.kudu.client.RpcProxy.responseReceived(RpcProxy.java:263) at org.apache.kudu.client.RpcProxy.access$000(RpcProxy.java:59) at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:152) at org.apache.kudu.client.RpcProxy$1.call(RpcProxy.java:148) at org.apache.kudu.client.Connection.messageReceived(Connection.java:391) at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.apache.kudu.client.Connection.handleUpstream(Connection.java:243) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.apache.kudu.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.messageReceived(ReadTimeoutHandler.java:184) at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.apache.kudu.shaded.org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) at org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) at org.apache.kudu.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) at org.apache.kudu.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.apache.kudu.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) at org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.apache.kudu.shaded.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.apache.kudu.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.apache.kudu.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ... 3 more

       Each task ran just for about 19 seconds then throws scanner not found error  while tserver uses a default scanner_ttl_ms (60s).In tserver log, We found the scanner that  memtioned in client log expired after spark job failed, and another tserver receives the scan request with that scannerId specifies.

       it seems AsyncKuduScanner in kudu java client will choose a random server when retrying scanNextRows, even though the AsyncKuduScanner already has a scannerId.

      Attachments

        Activity

          People

            Unassigned Unassigned
            tao min mintao
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: