Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
2.5.2
-
None
-
None
-
hadoop-2.5.2
hadoop-streaming-2.5.2.jar
Description
when writing hadoop streaming task. i used -archives to upload a tgz from local machine to hdfs task working directory, but it has not been untarred as the document says. I've searched a lot without any luck.
Here is the hadoop streaming task starting command with hadoop-2.5.2
hadoop jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.2.jar \
-files mapper.sh
-archives /home/hadoop/tmp/test.tgz#test \
-D mapreduce.job.maps=1 \
-D mapreduce.job.reduces=1 \
-input "/test/test.txt" \
-output "/res/" \
-mapper "sh mapper.sh" \
-reducer "cat"
and "mapper.sh"
cat > /dev/null
ls -l test
exit 0
in "test.tgz" there is two files "test.1.txt" and "test.2.txt"
echo "abcd" > test.1.txt
echo "efgh" > test.2.txt
tar zcvf test.tgz test.1.txt test.2.txt
the output from above task
lrwxrwxrwx 1 hadoop hadoop 71 Feb 8 23:25 test -> /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/116/test.tgz
but what desired may be like this
rw-rr- 1 hadoop hadoop 5 Feb 8 23:25 test.1.txt
rw-rr- 1 hadoop hadoop 5 Feb 8 23:25 test.2.txt
so, why test.tgz has not been untarred automatically as document says, and or there is actually another way makes the "tgz" being untarred