Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9307

Libprocess should have a way to detect stuck actor.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • libprocess
    • None

    Description

      We spent two days on a bug, which turns out to be an infinite loop in an actor, blocking other events from being processed by that actor.

      Currently, the only way to know about a stuck actor is to use gdb. We should think about a way to print error logs when an actor has stuck for more than a threshold.

      For instance, Linux kernel will print a warning in kernel log if a task is stuck for more than 120 seconds. Something like this will be extremely helpful.

      Another way is to expose some metrics around this.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jieyu Jie Yu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: