Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.3.0, 1.4.0
-
None
Description
I got a UBSAN error on Jenkins when running TsRecoveryITest.TestCrashBeforeWriteLogSegmentHeader under ASAN. I had made a one-line change that I was testing (unrelated to this). This is the error message I got:
2685 E0510 00:47:17.810153 656 fault_injection.cc:54] Injecting fault: FLAGS_fault_crash_before_write_log_segment_header (process will exit)
2686 W0510 00:47:17.821826 451 connection.cc:462] client connection to 127.121.138.0:50064 recv error: Network error: failed to read from TLS socket: Connection reset by peer (error 104)
2687 ../../src/kudu/security/tls_socket.cc:80:19: runtime error: signed integer overflow: 2018308256 + 2018308256 cannot be represented in type 'int'
2688 W0510 00:47:17.821890 460 connection.cc:462] server connection from 127.121.138.0:36990 recv error: Network error: failed to read from TLS socket: Connection reset by peer (error 104)
2689 W0510 00:47:17.822394 460 connection.cc:462] client connection to 127.121.138.0:50064 recv error: Network error: failed to read from TLS socket: Connection reset by peer (error 104)
2690 SUMMARY: AddressSanitizer: undefined-behavior ../../src/kudu/security/tls_socket.cc:80:19 in
The code in question looks like this as of master rev a877566e9477242c015758d105c8e616248af7c6
69 Status TlsSocket::Writev(const struct ::iovec *iov, int iov_len, int32_t *nwritten) { 70 SCOPED_OPENSSL_NO_PENDING_ERRORS; 71 CHECK(ssl_); 72 int32_t total_written = 0; 73 // Allows packets to be aggresively be accumulated before sending. 74 RETURN_NOT_OK(SetTcpCork(1)); 75 Status write_status = Status::OK(); 76 for (int i = 0; i < iov_len; ++i) { 77 int32_t frame_size = iov[i].iov_len; 78 // Don't return before unsetting TCP_CORK. 79 write_status = Write(static_cast<uint8_t*>(iov[i].iov_base), frame_size, nwritten); 80 total_written += *nwritten; 81 if (*nwritten < frame_size) break; 82 } 83 RETURN_NOT_OK(SetTcpCork(0)); 84 *nwritten = total_written; 85 return write_status; 86 }
I'm guessing what happened is the out-param was never set because Write() returned a status code and we are reading whatever was on the stack.
At the time of writing, the logs can be found here: http://dist-test.cloudera.org/job?job_id=mpercy.1494377196.9182