Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40874

Fix broadcasts in Python UDFs when encryption is enabled

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.1.4, 3.2.3, 3.3.2, 3.4.0
    • PySpark
    • None

    Description

      The following Pyspark script:

      bin/pyspark --conf spark.io.encryption.enabled=true
      
      ...
      
      bar = {"a": "aa", "b": "bb"}
      foo = spark.sparkContext.broadcast(bar)
      spark.udf.register("MYUDF", lambda x: foo.value[x] if x else "")
      spark.sql("SELECT MYUDF('a') AS a, MYUDF('b') AS b").collect()
      

      fails with:

      22/10/21 17:14:32 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)/ 1]
      org.apache.spark.api.python.PythonException: Traceback (most recent call last):
        File "/Users/petertoth/git/apache/spark/python/lib/pyspark.zip/pyspark/worker.py", line 811, in main
          func, profiler, deserializer, serializer = read_command(pickleSer, infile)
        File "/Users/petertoth/git/apache/spark/python/lib/pyspark.zip/pyspark/worker.py", line 87, in read_command
          command = serializer._read_with_length(file)
        File "/Users/petertoth/git/apache/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 173, in _read_with_length
          return self.loads(obj)
        File "/Users/petertoth/git/apache/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 471, in loads
          return cloudpickle.loads(obj, encoding=encoding)
      EOFError: Ran out of input
      

      Attachments

        Activity

          People

            petertoth Peter Toth
            petertoth Peter Toth
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: