Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-21630

Cluster falls apart on topology change when DNS service is unavailable

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.17
    • None
    • Fixed cluster failure on topology change when DNS service is unavailable - Fixes
    • Release Notes Required

    Description

      Requests to DNS service performed synchroniously by some critical discovery threads. Timeout for such requests can't be controlled by java code (see https://bugs.openjdk.org/browse/JDK-6450279). This leads to segmentation of nodes and falling apart cluster.

      For example, stack of tcp-disco-msg-worker thread with request to DNS service:

          at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
          at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1330)
          at java.net.InetAddress.getAllByName0(InetAddress.java:1283)
          at java.net.InetAddress.getAllByName(InetAddress.java:1199)
          at java.net.InetAddress.getAllByName(InetAddress.java:1127)
          at java.net.InetAddress.getByName(InetAddress.java:1077)
          at java.net.InetSocketAddress.<init>(InetSocketAddress.java:220)
          at org.apache.ignite.internal.util.IgniteUtils.createResolved(IgniteUtils.java:9829)
          at org.apache.ignite.internal.util.IgniteUtils.toSocketAddresses(IgniteUtils.java:9792)
          at org.apache.ignite.internal.util.IgniteUtils.toSocketAddresses(IgniteUtils.java:9770)
          at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNode.socketAddresses(TcpDiscoveryNode.java:392)
          at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.getNodeAddresses(TcpDiscoverySpi.java:1267)
          at org.apache.ignite.spi.discovery.tcp.ServerImpl.interruptPing(ServerImpl.java:985)
          at org.apache.ignite.spi.discovery.tcp.ServerImpl.access$6800(ServerImpl.java:206)
          at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processNodeLeftMessage(ServerImpl.java:5433)
          at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:3221)
          at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2894)
      

      Attachments

        Issue Links

          Activity

            People

              alex_pl Aleksey Plekhanov
              alex_pl Aleksey Plekhanov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 50m
                  3h 50m