Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-10203

Agent process crashes on newer linux kernels if 'linux/capabilities' isolation is enbaled

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.12.0
    • agent
    • None

    Description

      Mesos agent crashes with following stack trace on newer Linux kernels (>=5.8.x) if started with  MESOS_ISOLATION=linux/capabilities. 
      Tested on 5.7.19 where it was running fine, but fails on 5.8.18, 5.9.11 and 5.10

      Dec 13 05:08:28 mesosbox mesos-agent[465]: sh: hadoop: command not found
      Dec 13 05:08:28 mesosbox mesos-agent[466]: I1213 05:08:28.234824 458 fetcher.cpp:66] Skipping URI fetcher plugin 'hadoop' as it could not be created: Failed to create HDFS client: Hadoop client is not available, exit status: 32512
      Dec 13 05:08:28 mesosbox mesos-agent[466]: Reached unreachable statement at linux/capabilities.cpp:497
      Dec 13 05:08:28 mesosbox mesos-agent[466]: *** Aborted at 1607836108 (unix time) try "date -d @1607836108" if you are using GNU date ***
      Dec 13 05:08:28 mesosbox mesos-agent[466]: PC: @ 0x7f875bd62387 __GI_raise
      Dec 13 05:08:28 mesosbox mesos-agent[466]: *** SIGABRT (@0x1ca) received by PID 458 (TID 0x7f8760ddca00) from PID 458; stack trace: ***
      Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875c626630 (unknown)
      Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875bd62387 __GI_raise
      Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875bd63a78 __GI_abort
      Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875e60f237 (unknown)
      Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875ef6e7c1 (unknown)
      Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875ef723cc (unknown)
      Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875ef70c96 (unknown)
      Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875f05389d (unknown)
      Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875ed837fc (unknown)
      Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875ed72332 (unknown)
      Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875ecf54c6 (unknown)
      Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x55f5d9c1a256 (unknown)
      Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x7f875bd4e555 __libc_start_main
      Dec 13 05:08:28 mesosbox mesos-agent[466]: @ 0x55f5d9c1d10f (unknown)
      Dec 13 05:08:28 mesosbox kernel: audit: type=1701 audit(1607836108.250:274): auid=4294967295 uid=0 gid=0 ses=4294967295 subj==unconfined pid=4772 comm="mesos-agent" exe="/usr/sbin/mesos-agent" sig=6 res=1

       

      When looked further, I could find out that this was raised from linux/capabilities.cpp  which converts capability enum values to human-readable names.

      ostream& operator<<(ostream& stream, const Capability& capability)
              {
              switch (capability) {
                  case CHOWN:             return stream << "CHOWN";
                  case DAC_OVERRIDE:      return stream << "DAC_OVERRIDE";
                  case AUDIT_READ:        return stream << "AUDIT_READ";
                  ...
                  ...
                  case MAX_CAPABILITY:    UNREACHABLE(); // !!! Crash site
                }
                UNREACHABLE();
              }
      

      MAX_CAPABILITY is defined as 38.  But as of now, new capabilities were introduced to Linux. Namely,

      • CAP_PERFMON=38  // (since Linux 5.8) - Employ various performance-monitoring mechanisms
      • CAP_BPF=39             // (since Linux 5.8) - Employ privileged BPF operations;
      • CAP_CHECKPOINT_RESTORE=40      // (since Linux 5.9) - Allow checkpoint/restore related operations

      ref: https://github.com/torvalds/linux/blob/master/include/uapi/linux/capability.h

      Above Mesos code does not seem to respect such kernel evolutions. So adding new capability on Kernel will break the Isolator.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              arunmj Arun M J
              Benjamin Bannier Benjamin Bannier
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: