Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23470

org.apache.spark.ui.jobs.ApiHelper.lastStageNameAndDescription is too slow

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 2.3.0
    • 2.3.0
    • Web UI
    • None

    Description

      I was testing 2.3.0 RC3 and found that it's easy to hit "read timeout" when accessing All Jobs page. The stack dump says it was running "org.apache.spark.ui.jobs.ApiHelper.lastStageNameAndDescription".

      "SparkUI-59" #59 daemon prio=5 os_prio=0 tid=0x00007fc15b0a3000 nid=0x8dc runnable [0x00007fc0ce9f8000]
         java.lang.Thread.State: RUNNABLE
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.apache.spark.util.kvstore.KVTypeInfo$MethodAccessor.get(KVTypeInfo.java:154)
      	at org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.compare(InMemoryStore.java:248)
      	at org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.lambda$iterator$2(InMemoryStore.java:214)
      	at org.apache.spark.util.kvstore.InMemoryStore$InMemoryView$$Lambda$36/1834982692.compare(Unknown Source)
      	at java.util.TimSort.binarySort(TimSort.java:296)
      	at java.util.TimSort.sort(TimSort.java:239)
      	at java.util.Arrays.sort(Arrays.java:1512)
      	at java.util.ArrayList.sort(ArrayList.java:1460)
      	at java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:387)
      	at java.util.stream.Sink$ChainedReference.end(Sink.java:258)
      	at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:210)
      	at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
      	at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
      	at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
      	at org.apache.spark.util.kvstore.InMemoryStore$InMemoryIterator.hasNext(InMemoryStore.java:278)
      	at org.apache.spark.status.AppStatusStore.lastStageAttempt(AppStatusStore.scala:101)
      	at org.apache.spark.ui.jobs.ApiHelper$$anonfun$38.apply(StagePage.scala:1014)
      	at org.apache.spark.ui.jobs.ApiHelper$$anonfun$38.apply(StagePage.scala:1014)
      	at org.apache.spark.status.AppStatusStore.asOption(AppStatusStore.scala:408)
      	at org.apache.spark.ui.jobs.ApiHelper$.lastStageNameAndDescription(StagePage.scala:1014)
      	at org.apache.spark.ui.jobs.JobDataSource.org$apache$spark$ui$jobs$JobDataSource$$jobRow(AllJobsPage.scala:434)
      	at org.apache.spark.ui.jobs.JobDataSource$$anonfun$24.apply(AllJobsPage.scala:412)
      	at org.apache.spark.ui.jobs.JobDataSource$$anonfun$24.apply(AllJobsPage.scala:412)
      	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      	at scala.collection.immutable.List.foreach(List.scala:381)
      	at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
      	at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
      	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
      	at scala.collection.AbstractTraversable.map(Traversable.scala:104)
      	at org.apache.spark.ui.jobs.JobDataSource.<init>(AllJobsPage.scala:412)
      	at org.apache.spark.ui.jobs.JobPagedTable.<init>(AllJobsPage.scala:504)
      	at org.apache.spark.ui.jobs.AllJobsPage.jobsTable(AllJobsPage.scala:246)
      	at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:295)
      	at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98)
      	at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:98)
      	at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
      	at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
      	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
      	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
      	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
      	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
      	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
      	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
      	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
      	at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
      	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
      

      According to the heap dump, there are 954 JobDataWrapper and 54690 StageDataWrapper. It's obvious that the UI will be slow since we need to sort 54690 items for 954 jobs.

      Attachments

        Issue Links

          Activity

            People

              vanzin Marcelo Masiero Vanzin
              zsxwing Shixiong Zhu
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: