Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-333

Large samza configurations results in yarn job failure

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • container
    • None

    Description

      Application application_1404246879802_0019 failed 50 times due to AM Container for appattempt_1404246879802_0019_000050 exited with exitCode: 0 due to: Exception from container-launch: java.io.IOException: Cannot run program "nice" (in directory "/export/content/data/samsa-yarn/usercache/samza-job/appcache/application_1404246879802_0019/container_1404246879802_0019_50_000001"): error=7, Argument list too long
      java.io.IOException: Cannot run program "nice" (in directory "/export/content/data/samsa-yarn/usercache/samza/appcache/application_1404246879802_0019/container_1404246879802_0019_50_000001"): error=7, Argument list too long
      at java.lang.ProcessBuilder.start(ProcessBuilder.java:1042)
      at org.apache.hadoop.util.Shell.runCommand(Shell.java:448)
      at org.apache.hadoop.util.Shell.run(Shell.java:418)
      at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
      at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.io.IOException: error=7, Argument list too long
      at java.lang.UNIXProcess.forkAndExec(Native Method)
      at java.lang.UNIXProcess.<init>(UNIXProcess.java:187)
      at java.lang.ProcessImpl.start(ProcessImpl.java:134)
      at java.lang.ProcessBuilder.start(ProcessBuilder.java:1023)
      ... 10 more
      .Failing this attempt.. Failing the application.
      

      This happens because the launch_container.sh script generated by yarn has all the export variables (including samza configs) and the run_container scripts, and when we export a big config variable it crashes the current shell it's running in.

      For e.g., the size of the variable "SAMZA_SYSTEM_STREAMS" from launch_container config is:

      bash-4.1$ sed '12q;d' launch_container.sh | wc -c
      167546
      

      As indicated here, http://www.in-ulm.de/~mascheck/various/argmax/
      The maximum size of an argument is bound by MAX_ARG_STRLEN (131072).

      This can be reproduced by exporting a large variable

      [nsomasun@eat1-app201 usercache]$ sudo -uapp bash
      bash-4.1$ export b1=A
      bash-4.1$ export b2=$b1$b1
      bash-4.1$ export b4=$b2$b2
      bash-4.1$ export b8=$b4$b4
      bash-4.1$ export b16=$b8$b8
      bash-4.1$ export b32=$b16$b16
      bash-4.1$ export b64=$b32$b32
      bash-4.1$ export b128=$b64$b64
      bash-4.1$ export b256=$b128$b128
      bash-4.1$ export b512=$b256$b256
      bash-4.1$ export b1k=$b512$b512
      bash-4.1$ export b2k=$b1k$b1k
      bash-4.1$ export b4k=$b2k$b2k
      bash-4.1$ export b8k=$b4k$b4k
      bash-4.1$ export b16k=$b8k$b8k
      bash-4.1$ export b32k=$b16k$b16k
      bash-4.1$ export b64k=$b32k$b32k
      bash-4.1$ export b128k=$b64k$b64k
      bash-4.1$ ls
      bash: /bin/ls: Argument list too long
      

      We need alternate mechanisms to pass configurations to the samza container, since we bound by the size of the variable the shell can support.

      Attachments

        Issue Links

          Activity

            People

              nickpan47 Yi Pan
              nsomasun Naveen
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: