Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6249

Streaming task will not untar tgz uploaded with -archives

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 2.5.2
    • None
    • contrib/streaming
    • None
    • hadoop-2.5.2
      hadoop-streaming-2.5.2.jar

    Description

      when writing hadoop streaming task. i used -archives to upload a tgz from local machine to hdfs task working directory, but it has not been untarred as the document says. I've searched a lot without any luck.

      Here is the hadoop streaming task starting command with hadoop-2.5.2

      hadoop jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.2.jar \
      -files mapper.sh
      -archives /home/hadoop/tmp/test.tgz#test \
      -D mapreduce.job.maps=1 \
      -D mapreduce.job.reduces=1 \
      -input "/test/test.txt" \
      -output "/res/" \
      -mapper "sh mapper.sh" \
      -reducer "cat"

      and "mapper.sh"

      cat > /dev/null
      ls -l test
      exit 0

      in "test.tgz" there is two files "test.1.txt" and "test.2.txt"

      echo "abcd" > test.1.txt
      echo "efgh" > test.2.txt
      tar zcvf test.tgz test.1.txt test.2.txt

      the output from above task

      lrwxrwxrwx 1 hadoop hadoop 71 Feb 8 23:25 test -> /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/116/test.tgz

      but what desired may be like this

      rw-rr- 1 hadoop hadoop 5 Feb 8 23:25 test.1.txt
      rw-rr- 1 hadoop hadoop 5 Feb 8 23:25 test.2.txt

      so, why test.tgz has not been untarred automatically as document says, and or there is actually another way makes the "tgz" being untarred

      Attachments

        Activity

          People

            Unassigned Unassigned
            Tios Liu Xiao
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: