Description
Fetcher considers connection failure only when http.connect throws exception. In kubernetes environment, where there can be intermediate proxies, getInputStream from http connection can throw connection reset error (5xx). These errors should be considered as connection failures as well.
2020-05-08 17:03:54.080 WARN [Fetcher_B {Map_3} #3] shuffle.Fetcher: Fetch Failure while connecting from 10.117.155.27 to: 10.117.154.115:25551, attempt: InputAttemptIdentifier [inputIdentifier=0, attemptNumber=0, pathComponent=attempt_1588982534035_0000_1_00_000000_0_10030, spillType=0, spillId=-1] Informing ShuffleManager: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:210) at java.net.SocketInputStream.read(SocketInputStream.java:141) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:706) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1593) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498) at org.apache.tez.http.HttpConnection.getInputStream(HttpConnection.java:260) at org.apache.tez.runtime.library.common.shuffle.Fetcher.setupConnection(Fetcher.java:530) at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:563) at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:487) at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:285) at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:76) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)