Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30065

Unable to drop na with duplicate columns

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4, 3.0.0
    • 2.4.5, 3.0.0
    • SQL
    • None

    Description

      Trying to drop rows with null values fails even when no columns are specified. This should be allowed:

      scala> val left = Seq(("1", null), ("3", "4")).toDF("col1", "col2")
      left: org.apache.spark.sql.DataFrame = [col1: string, col2: string]
      
      scala> val right = Seq(("1", "2"), ("3", null)).toDF("col1", "col2")
      right: org.apache.spark.sql.DataFrame = [col1: string, col2: string]
      
      scala> val df = left.join(right, Seq("col1"))
      df: org.apache.spark.sql.DataFrame = [col1: string, col2: string ... 1 more field]
      
      scala> df.show
      +----+----+----+
      |col1|col2|col2|
      +----+----+----+
      |   1|null|   2|
      |   3|   4|null|
      +----+----+----+
      
      
      scala> df.na.drop("any")
      org.apache.spark.sql.AnalysisException: Reference 'col2' is ambiguous, could be: col2, col2.;
        at org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:240)
      

      Attachments

        Issue Links

          Activity

            People

              imback82 Terry Kim
              imback82 Terry Kim
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: