Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30989

TABLE.COLUMN reference doesn't work with new columns created by UDF

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.4.4
    • None
    • SQL

    Description

      When a dataframe is created with an alias (`.as("...")`) its columns can be referred as `TABLE.COLUMN` but it doesn't work for newly created columns with UDF.

      
      // code placeholder
      df1 = sc.parallelize(l).toDF("x","y").as("cat")
      val squared = udf((s: Int) => s * s)
      val df2 = df1.withColumn("z", squared(col("y")))
      df2.columns //Array[String] = Array(x, y, z)
      
      df2.select("cat.x") // works
      
      df2.select("cat.z") // Doesn't work
      // org.apache.spark.sql.AnalysisException: cannot resolve '`cat.z`' given input 
      // columns: [cat.x, cat.y, z];;
      

      Might be related to: https://issues.apache.org/jira/browse/SPARK-30532

      Attachments

        Activity

          People

            Unassigned Unassigned
            chris_suchanek Chris Suchanek
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: