Uploaded image for project: 'Livy'
  1. Livy
  2. LIVY-859

Start PySpark application Failed via sparkmagic with Python 3.7.6

    XMLWordPrintableJSON

Details

    • Question
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • 0.9.0
    • None
    • None
    • CDH: 6.3.2
      Spark: 2.4.0
      Python: 3.7.6
      Livy: 0.7.0-incubating

    Description

      I have two CDH clusters (stage and prod) with the same environment, the same Livy service installed on them, a Python 3.7 environment configured via Ansible, and I write PySpark by calling Livy via sparkmagic in jupyter-lab. code in jupyter-lab, the stage environment works fine, but the prod environment gives an error.

      Error log:

       

      21/05/20 14:43:45 INFO driver.SparkEntries: Created Spark session (with Hive support).
      21/05/20 14:43:50 ERROR repl.PythonInterpreter: Process has died with 134
      21/05/20 14:43:50 ERROR repl.PythonInterpreter: Fatal Python error: initfsencoding: unable to load the file system codec
      ModuleNotFoundError: No module named 'encodings'

       

      On a prod environment machine, I actually have PYSPARK_PYTHON (/opt/miniconda/bin/python) configured to ``import encodings``, and I can run pyspark SHELL directly without any problems.

      My environment configuration.

      CDH: 6.3.2
      Spark: 2.4.0
      Python: 3.7.6

      In CDH's "Spark Service Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh" configuration, I configured PYSPARK_PYTHON and PYSPARK_ DRIVER_PYTHON

      ```
      export PYSPARK_PYTHON=${PYSPARK_PYTHON:-/opt/miniconda/bin/python}
      export PYSPARK_DRIVER_PYTHON=${PYSPARK_DRIVER_PYTHON:-/opt/miniconda/bin/python}
      ```

      livy-env.sh also configures PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON

      ```
      PYSPARK_PYTHON=/opt/miniconda/bin/python
      PYSPARK_DRIVER_PYTHON=/opt/miniconda/bin/python

      JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera/
      HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
      HADOOP_CONF_DIR=/etc/hadoop/conf
      SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
      ```

      Attachments

        Activity

          People

            Unassigned Unassigned
            lxneng Eric Luo
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: