Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-4174

[Kubernetes] Fetcher should connection failure on SocketException

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.10.0
    • 0.10.0
    • None
    • None

    Description

      Fetcher considers connection failure only when http.connect throws exception. In kubernetes environment, where there can be intermediate proxies, getInputStream from http connection can throw connection reset error (5xx). These errors should be considered as connection failures as well.

      2020-05-08 17:03:54.080  WARN [Fetcher_B {Map_3} #3] shuffle.Fetcher: Fetch Failure while connecting from 10.117.155.27 to: 10.117.154.115:25551, attempt: InputAttemptIdentifier [inputIdentifier=0, attemptNumber=0, pathComponent=attempt_1588982534035_0000_1_00_000000_0_10030, spillType=0, spillId=-1] Informing ShuffleManager:
      java.net.SocketException: Connection reset
              at java.net.SocketInputStream.read(SocketInputStream.java:210)
              at java.net.SocketInputStream.read(SocketInputStream.java:141)
              at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
              at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
              at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
              at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
              at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
              at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:706)
              at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1593)
              at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498)
              at org.apache.tez.http.HttpConnection.getInputStream(HttpConnection.java:260)
              at org.apache.tez.runtime.library.common.shuffle.Fetcher.setupConnection(Fetcher.java:530)
              at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:563)
              at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:487)
              at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:285)
              at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:76)
              at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
              at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
              at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
              at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748) 

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            prasanth_j Prasanth Jayachandran
            prasanth_j Prasanth Jayachandran
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 0.5h
                0.5h

                Slack

                  Issue deployment