Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
3.5.6
-
None
-
None
Description
Currently the QuorumCxnManager connectOne method dumps an exception when it encounters java.net.SocketTimeoutException: Read timed out, or java.net.ConnectException: Connection refused in addition to providing an error message.
As an example, the following output is seen:
[2020-01-20 00:21:23,828] WARN Cannot open channel to 3 at election address aaa-3/10.0.1.3:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager)
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:610)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:838)
These exceptions are frequently output when launching and restarting several zookeeper servers and create confusion in what are normal operations and expected errors. I would suggest a few of these specific expected errors could be detected and reduced to only the text error output without the accompanying exception
When launching the first node in a 3 node quorum cluster, about 120 lines of error output are generated for a working launch.
I would be happy to make some of these changes if this approach is agreeable to the maintainers. My approach would be to look for the specific standard conditions in the exception handling and eliminate the exception stack trace where present in these cases.