Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-2820

gremlin-python _close_session race condition/FD leak

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 3.6.1
    • 3.7.0, 3.6.3, 3.5.6
    • python
    • None

    Description

      There is a race condition in gremlin-python when closing session-based connections that results in leaking file descriptors for event loops - eventually leading to an `OSError [Errno 24] too many open files` error after enough transactions occur.

      The problem stems from a race condition when closing session based connections that causes the event loop opened for the session's connection to be left open.

      The problem is completely contained in these two methods from `gremlin_python.driver.client.py`

      ```py
      def close(self):

          # prevent the Client from being closed more than once. it raises errors if new jobby jobs

          # get submitted to the executor when it is shutdown
          if self._closed:
              return

          if self._session_enabled:
              self._close_session() # 1. (see below)
          log.info("Closing Client with url '%s'", self._url)
          while not self._pool.empty(): # 3. (see below)
              conn = self._pool.get(True)
              conn.close()
          self._executor.shutdown()
          self._closed = True

      def _close_session(self):
          message = request.RequestMessage(
              processor='session', op='close',
              args={'session': str(self._session)})
          conn = self._pool.get(True) 
          return conn.write(message).result() # 2. (see below)
      ```

      1. `_close_session()` called
      2. `.result()` waits for the write to finish, but does not wait for the read to finish. `conn` does not get put back into `self._pool` until AFTER the read finishes (`gremlin_python.driver.connection.Connection._receive()`). However, this method returns early and goes to 3.
      3. this while loop is not entered to close out the connections. This leaves the conn's event loop running, never to be closed.

      I was able to solve this by modifying `_close_session` as follows:

      ```py
      def _close_session(self):
          message = request.RequestMessage(
              processor='session', op='close',
              args={'session': str(self._session)})
          conn = self._pool.get(True)
          try:
              write_result_set = conn.write(message).result()
              return write_result_set.all().result() # wait for _receive() to finish
          except protocol.GremlinServerError:
              pass
      ```

      I'm not sure if this is the correct solution, but wanted to point out the bug.

      In the meantime however, I wrote a context manager to handle this cleanup for me

      ```py
      @contextlib.contextmanager
      def transaction():
          tx = g.tx()
          gtx = tx.begin()

          try:
              yield tx, gtx
              tx.commit()
          except Exception as e:
              tx.rollback()
          finally:
              while not tx._session_based_connection._client._pool.empty():
                  conn = tx._session_based_connection._client._pool.get(True)
                  conn.close()
                  logger.info("Closed abandoned session connection")

      with transaction() as (tx, gtx):
          foo = gtx.some_traversal().to_list()

          # do something with foo
          gtx.some_other_traversal().iterate()
      ```

      Cheers

      Attachments

        Activity

          People

            valentyn Valentyn Kahamlyk
            hamilton-earthscope Alex Hamilton
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: