Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6382

[Python] Unable to catch Spark Python UDF exceptions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Won't Fix
    • 0.14.1
    • None
    • Python
    • None
    • Ubuntu 18.04

    Description

      When PyArrow is enabled, Pandas UDF exceptions raised by the Executor become impossible to catch: see example below. Is this expected behavior?

      If so, what is the rationale. If not, how do I fix this?

      Confirmed behavior in PyArrow 0.11 and 0.14.1 (latest) and PySpark 2.4.0 and 2.4.3. Python 3.6.5.

      To reproduce:

      import pandas as pd
      from pyspark.sql import SparkSession
      from pyspark.sql.functions import udf
      
      spark = SparkSession.builder.getOrCreate()
      
      # setting this to false will allow the exception to be caught
      spark.conf.set("spark.sql.execution.arrow.enabled", "true")
      
      @udfdef disrupt:
          raise Exception("Test EXCEPTION")
      
      data = spark.createDataFrame(pd.DataFrame({"A": [1, 2, 3]}))
      try: 
          test = data.withColumn("test", disrupt("A")).toPandas()
      except:
          print("exception caught")
      
      print('end')

      I would hope there's a way to catch the exception with the general except clause.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            Valendin Jan
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: