Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.1.0
-
None
-
None
Description
History server will be unavailable if there is an event log file with large size.
Large size here means the replaying time is too long.
We can fix this to add a timeout for event log replaying.
Here is an example:
Every application submitted after restart can not open history ui.
From event log directory we can find an event log file size is bigger than 130GB.
hadoop *144149840801* 2017-08-29 14:03 /spark/xxx/log/history/application_1501588539284_1118255_1.lz4.inprogress
and from jstack and server log we can see replaying task blocked on this event log:
server log:
2017-10-12,16:00:12,151 INFO org.apache.spark.deploy.history.FsHistoryProvider: Replaying log path: hdfs://xxx/spark/xxx/log/history/application_1501588539284_1118255_1.lz4.inprogress 2017-10-12,16:00:12,167 INFO org.apache.spark.scheduler.ReplayListenerBus: Begin to replay hdfs://xxx/spark/xxx/log/history/application_1501588539284_1118255_1.lz4.inprogress!
jstack
"log-replay-executor-0" daemon prio=10 tid=0x00007f0f48014800 nid=0x6160 runnable [0x00007f0f4f6f5000] java.lang.Thread.State: RUNNABLE at net.jpountz.lz4.LZ4JNI.LZ4_decompress_fast(Native Method) at net.jpountz.lz4.LZ4JNIFastDecompressor.decompress(LZ4JNIFastDecompressor.java:37) at org.apache.spark.io.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:205) at org.apache.spark.io.LZ4BlockInputStream.read(LZ4BlockInputStream.java:125) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177) - locked <0x00000005f0096948> (a java.io.InputStreamReader) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:154) at java.io.BufferedReader.readLine(BufferedReader.java:317) - locked <0x00000005f0096948> (a java.io.InputStreamReader) at java.io.BufferedReader.readLine(BufferedReader.java:382) at scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:72) at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:836) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461) at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:79) at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58) at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:776) at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:584) at org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$3$$anon$4.run(FsHistoryProvider.scala:464) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
Attachments
Attachments
Issue Links
- duplicates
-
SPARK-20656 Incremental parsing of event logs in SHS
- Resolved
- links to