Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-4367

Zookeeper#Login thread leak in case of Sasl AuthFailed.

    XMLWordPrintableJSON

Details

    Description

      We are seeing 1000's of Zookeeper#Login threads leak in our production clusters.
      ZooKeeperSaslClient#createSaslClient creates Login thread.
      ZooKeeperSaslClient#createSaslToken throws SaslException which propagates all the way back to ClientCnxn#SendThread#run method.

      ClientCnxn#SendThread#run handles SaslException by changing setting state to AUTH_FAILED, queueing the eventOfDeath for EventThread and exiting/cleaning up the SendThread but we DON'T close the zookeeperSaslClient which in turns shutDown the Login thread.

      Logs are added below for one failed connection.

      `20210831053800.393 jute.maxbuffer value is 4194304 Bytes
      `20210831053800.393 Initiating client connection, connectString=<zookeeper-ensemble string> sessionTimeout=4000 watcher=org.apache.curator.ConnectionState@7b974f93
      
      `20210831053800.401 zookeeper.request.timeout value is 10000. feature enabled=
      `20210831053800.404 Client successfully logged in.
      `20210831053800.405 Client will use GSSAPI as SASL mechanism.
      `20210831053800.405 TGT refresh sleeping until: Wed Sep 01 00:59:06 GMT 2021
      `20210831053800.405 TGT refresh thread started.
      `20210831053800.405 TGT valid starting at:        Tue Aug 31 05:38:00 GMT 2021
      `20210831053800.405 TGT expires:                  Wed Sep 01 05:38:00 GMT 2021
      
      `20210831053800.407 Opening socket connection to server <zookeeper-server-1>. Will attempt to SASL-authenticate using Login Context section 'Client'
      
      `20210831053800.419 Socket connection established to <zookeeper-server-1>, initiating session
      
      `20210831053800.435 Session establishment complete on server <zookeeper-server-1>, sessionid = 0x1000004066cc52b, negotiated timeout = 6000
      
      `20210831053800.438 An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - UNKNOWN_SERVER)]) occurred when evaluating Zookeeper Quorum Member's  received SASL token. This may be caused by Java's being unable to resolve the Zookeeper Quorum Member's hostname correctly. You may want to try to adding '-Dsun.net.spi.nameservice.provider.1=dns,sun' to your client's JVMFLAGS environment. Zookeeper Client will go to AUTH_FAILED state.
      
      `20210831053800.438 EventThread shut down for session: 0x1000004066cc52b
      
      `20210831053800.438 SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslException: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - UNKNOWN_SERVER)]) occurred when evaluating Zookeeper Quorum Member's  received SASL token. This may be caused by Java's being unable to resolve the Zookeeper Quorum Member's hostname correctly. You may want to try to adding '-Dsun.net.spi.nameservice.provider.1=dns,sun' to your client's JVMFLAGS environment. Zookeeper Client will go to AUTH_FAILED state.
      

      What is the correct way to shutdown Login thread in case of SaslException ?
      We use Curator framework to connect to Zookeeper.

      We fixed similar bug here where we were leaking EventThreads. ZOOKEEPER-3059
      This is similar except for Login threads. Please help.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              shahrs87 Rushabh Shah
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4h 20m
                  4h 20m