Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-11572

hadoop-yarn cgroup directory is deleted after each "systemctl daemon-reload" command

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.4
    • None
    • nodemanager
    • None
    •  

       

    Description

      I have an Hadoop cluster and I need to activate cgroups in order to use GPU in docker environment. I followed the documentation for the setup.

       

      To summarize: I do manage myself the cgroups creation (cpu, cpuacct and devices), which results as expected on the creation of 3 directories in /sys/fs/cgroup/. However, upon each systemctl daemon-reload, the /sys/fs/cgroup-hadoop-yarn directory is systematically deleted, which prevents Yarn's nodemanager from working.

       

      In details:

      As it's written in the documentation, I kept the parameter yarn.nodemanager.linux-container-executor.cgroups.mount to false in order manage the cgroup myself (security reason).

      As I'm on CentOS 8, I use cgroup v1. I defined the parameters :

      • yarn.nodemanager.linux-container-executor.cgroups.hierarchy to /hadoop-yarn
      • yarn.nodemanager.linux-container-executor.cgroups.mount-path to /sys/fs/cgroup

      Yarn needs 3 cgroups : cpu, cpuacct and devices.

      In order to have the /haddop-yarn persistent, I've install libcgroup rpm then I've updated /etc/cgconfig.conf with

      group hadoop-yarn {
           perm {
               admin {
                   uid = yarn;
                   gid = hadoop;
               }
               task {
                   uid = yarn;
                   gid = hadoop;
               }
           }
           cpu {}
           cpuacct {}
           devices {}
       }
      

      and I've started cgconfig service. The 3 directories are created :

      $ ll /sys/fs/cgroup/{cpu,cpuacct,devices}/hadoop-yarn/ -d
      drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpuacct/hadoop-yarn/
      {{drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpu/hadoop-yarn/ }}
      {{drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:27 /sys/fs/cgroup/devices/hadoop-yarn/ }}
      

       

      At this point, I can restart the Yarn NodeManager.

      However, each time that someone execute systemctl daemon-reload, the devices directory is deleted :

      $ ll /sys/fs/cgroup/{cpu,cpuacct,devices}/hadoop-yarn/ -d
      {{ls: cannot access '/sys/fs/cgroup/devices/hadoop-yarn/': No such file or directory }}
      {{drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpuacct/hadoop-yarn/ }}
      drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpu/hadoop-yarn/

       

      I see nothing in logs, I have no idea why this directory is deleted. And of course, Yarn NodeManager needs this directory, so the NodeManager doesn't work anymore and needs to be restarted (once the directory has been re-created of course).

      As an other solution of cgconfig service, I've tested to create my own service that will create these directories.

      vim /etc/systemd/system/hadoop-yarn-cgroup.service
      
      [Unit]
      Description=Custom cgroup for Hadoop YARN
      
      [Service]
      ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/cpu/hadoop-yarn
      ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/cpuacct/hadoop-yarn
      ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/devices/hadoop-yarn
      ExecStartPre=/usr/bin/chown -R yarn:hadoop /sys/fs/cgroup/cpu/hadoop-yarn/
      ExecStartPre=/usr/bin/chown -R yarn:hadoop /sys/fs/cgroup/cpuacct/hadoop-yarn/
      ExecStartPre=/usr/bin/chown -R yarn:hadoop /sys/fs/cgroup/devices/hadoop-yarn/
      ExecStart=/bin/true
      Slice=hadoop-yarn.slice
      MemoryAccounting=yes
      MemoryLimit=1G
      
      [Install]
      WantedBy=multi-user.target

       

      The behaviour is the same :

      • directories are created
      • systemctl daemon-reload
      • devices/hadoop-yarn directory is deleted

      Attachments

        Activity

          People

            Unassigned Unassigned
            guetjean Jean-Baptiste Guet
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: