Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11509

ipython notebooks do not work on clusters created using spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 script

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 1.5.1
    • None
    • Documentation, EC2, PySpark
    • None
    • AWS cluster
      [ec2-user@ip-172-31-29-60 ~]$ uname -a
      Linux ip-172-31-29-60.us-west-1.compute.internal 3.4.37-40.44.amzn1.x86_64 #1 SMP Thu Mar 21 01:17:08 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

    Description

      I recently downloaded spark-1.5.1-bin-hadoop2.6 to my local mac.

      I used spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 to create an aws cluster. I am able to run the java SparkPi example on the cluster how ever I am not able to run ipython notebooks on the cluster. (I connect using ssh tunnel)

      According to the 1.5.1 getting started doc http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell

      The following should work

      PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook --no-browser --port=7000" /root/spark/bin/pyspark

      I am able to connect to the notebook server and start a notebook how ever

      bug 1) the default sparkContext does not exist

      from pyspark import SparkContext
      textFile = sc.textFile("file:///home/ec2-user/dataScience/readme.md")
      textFile.take(3

      ---------------------------------------------------------------------------
      NameError Traceback (most recent call last)
      <ipython-input-1-127b6a58d5cc> in <module>()
      1 from pyspark import SparkContext
      ----> 2 textFile = sc.textFile("file:///home/ec2-user/dataScience/readme.md")
      3 textFile.take(3)

      NameError: name 'sc' is not defined

      bug 2)
      If I create a SparkContext I get the following python versions miss match error

      sc = SparkContext("local", "Simple App")
      textFile = sc.textFile("file:///home/ec2-user/dataScience/readme.md")
      textFile.take(3)

      File "/root/spark/python/lib/pyspark.zip/pyspark/worker.py", line 64, in main
      ("%d.%d" % sys.version_info[:2], version))
      Exception: Python in worker has different version 2.7 than that in driver 2.6, PySpark cannot run with different minor versions

      I am able to run ipython notebooks on my local Mac as follows. (by default you would get an error that the driver and works are using different version of python)

      $ cat ~/bin/pySparkNotebook.sh
      #!/bin/sh

      set -x # turn debugging on
      #set +x # turn debugging off

      export PYSPARK_PYTHON=python3
      export PYSPARK_DRIVER_PYTHON=python3
      IPYTHON_OPTS=notebook $SPARK_ROOT/bin/pyspark $*$

      I have spent a lot of time trying to debug the pyspark script however I can not figure out what the problem is

      Please let me know if there is something I can do to help

      Andy

      Attachments

        Activity

          People

            Unassigned Unassigned
            aedwip Andrew Davidson
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: