Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18509

spark-ec2 init.sh requests .tgz files not available at http://s3.amazonaws.com/spark-related-packages

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 1.6.3, 2.0.1
    • None
    • EC2
    • AWS EC2, AWS Linux, OS X 10.12.x (local)

    Description

      When I run the spark-ec2 script in a local spark-1.6.3 installation, the error 'ERROR: Unknown Spark version' is generated:

      Initializing spark
      -2016-11-18 22:33:06- http://s3.amazonaws.com/spark-related-packages/spark-1.6.3-bin-hadoop1.tgz
      Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.1.3
      Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.1.3|:80... connected.
      HTTP request sent, awaiting response... 404 Not Found
      2016-11-18 22:33:06 ERROR 404: Not Found.

      ERROR: Unknown Spark version
      spark/init.sh: line 137: return: -1: invalid option
      return: usage: return [n]
      Unpacking Spark
      tar (child): spark-*.tgz: Cannot open: No such file or directory
      tar (child): Error is not recoverable: exiting now
      tar: Child returned status 2
      tar: Error is not recoverable: exiting now
      rm: cannot remove `spark-*.tgz': No such file or directory
      mv: missing destination file operand after `spark'
      Try `mv --help' for more information.
      [timing] spark init: 00h 00m 00s

      I think this happens when init.sh executes these lines:
      if [[ "$HADOOP_MAJOR_VERSION" == "1" ]]; then
      wget http://s3.amazonaws.com/spark-related-packages/spark-$SPARK_VERSION-bin-hadoop1.tgz
      elif [[ "$HADOOP_MAJOR_VERSION" == "2" ]]; then
      wget http://s3.amazonaws.com/spark-related-packages/spark-$SPARK_VERSION-bin-cdh4.tgz
      else
      wget http://s3.amazonaws.com/spark-related-packages/spark-$SPARK_VERSION-bin-hadoop2.4.tgz
      fi
      if [ $? != 0 ]; then
      echo "ERROR: Unknown Spark version"
      return -1
      fi

      spark-1.6.3-bin-hadoop1.tgz does not exist on <http://s3.amazonaws.com/spark-related-packages/>
      Similarly, a spark-2.0.1-bin-hadoop1.tgz also does not exist at that location. So with these versions, if in init.sh [ "$HADOOP_MAJOR_VERSION" == "1" ] evaluates to True, spark installation on the EC2 cluster will fail.

      Related (perhaps a different bug?) is: I have installed spark-1.6.3-bin-hadoop2.6.tgz, but if the error is generated by init.sh, then it appears that HADOOP_MAJOR_VERSION ==1 is True, otherwise a different spark version would be requested from <http://s3.amazonaws.com/spark-related-packages/>. I am not experienced enough to verify this. My installed hadoop version should be 2.6. Please tell me if this should be a different bug report.

      Attachments

        Activity

          People

            Unassigned Unassigned
            pbpearman Peter B. Pearman
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 3h
                3h
                Remaining:
                Remaining Estimate - 3h
                3h
                Logged:
                Time Spent - Not Specified
                Not Specified