Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-3723

Zookeeper Client should not fail with ZSYSTEMERROR if DNS does not resolve one of the servers in the zk ensemble.

    XMLWordPrintableJSON

Details

    Description

      This is a minor enhancement request to not fail the session initiation if the DNS is not able to resolve the hostname of one of the servers in the Zookeeper ensemble.

       

      The Zookeeper client resolves all the hostnames in the ensemble while establishing the session.

      In Kubernetes environment with coreDNS, the hostname entry gets removed from coreDNS during the POD restarts. Though we can manipulate the coreDNS settings to delay the removal of the hostname entry from DNS, we don't want to leave any race where Zookeeper clinet is trying to establish a session and it fails because the DNS temporarily is not able to resolve the hostname. So as long as one of the servers in the ensemble is able to be DNS resolvable, should we not fail the session establishment with hard error and instead try to establish the connection with one of the other servers?

       

      Look at the below snippet where  resolve_hosts() fails with ZSYSTEMERROR.

      if ((rc = getaddrinfo(host, port_spec, &hints, &res0)) != 0) {
                  //bug in getaddrinfo implementation when it returns
                  //EAI_BADFLAGS or EAI_ADDRFAMILY with AF_UNSPEC and
                  // ai_flags as AI_ADDRCONFIG
      #ifdef AI_ADDRCONFIG
                  if ((hints.ai_flags == AI_ADDRCONFIG) &&
      // ZOOKEEPER-1323 EAI_NODATA and EAI_ADDRFAMILY are deprecated in FreeBSD.
      #ifdef EAI_ADDRFAMILY
                      ((rc ==EAI_BADFLAGS) || (rc == EAI_ADDRFAMILY))) {
      #else
                      (rc == EAI_BADFLAGS)) {
      #endif
                      //reset ai_flags to null
                      hints.ai_flags = 0;
                      //retry getaddrinfo
                      rc = getaddrinfo(host, port_spec, &hints, &res0);
                  }
      #endif
                  if (rc != 0) {
                      errno = getaddrinfo_errno(rc);
      #ifdef _WIN32
                      LOG_ERROR(LOGCALLBACK(zh), "Win32 message: %s\n", gai_strerror(rc));
      #elif __linux__ && __GNUC__
                      LOG_ERROR(LOGCALLBACK(zh), "getaddrinfo: %s\n", gai_strerror(rc));
      #else
                      LOG_ERROR(LOGCALLBACK(zh), "getaddrinfo: %s\n", strerror(errno));
      #endif
                      rc=ZSYSTEMERROR;
                      goto fail;
                  }
              }
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              suhas.dantkale Suhas Dantkale
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m