Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-8563

Windows executors cannot re-register

    XMLWordPrintableJSON

Details

    Description

      This issue captures an important (but already resolved) bug due to incorrect inheritance of sockets.

      When enabling agent recovery, it was discovered that the executors could not re-register to the new agent. They would send the re-register message, and then fail silently. The agent never received the re-register message.

      This turned out to be due to incorrect inheritance semantics of sockets. On POSIX systems, os::cloexec was used to prevent file descriptors (or socket handles, on Windows) from being inherited by child processes. On Windows, we were creating SOCKET handles using the CRT API ::socket, which by default created inheritable socket handles. The subsequent call to os::cloexec to prevent this was a no-op, leaving us leaking socket handles to all child processes, causing the described bug.

      The solution was to split net::socket into a POSIX and Windows implementation, where on Windows we use the WinSock 2 API WSASocket, which allows us to create the socket upfront with WSA_FLAG_NO_HANDLE_INHERIT, preventing the leaks. This is somewhat like using O_CLOEXEC on Linux.

      Attachments

        Issue Links

          Activity

            People

              andschwa Andrew Schwartzmeyer
              andschwa Andrew Schwartzmeyer
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: