Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47845

Support column type in split function in scala and python

    XMLWordPrintableJSON

Details

    Description

      I have a use case to split a String typed column with different delimiters defined in other columns of the dataframe. SQL already supports this, but scala / python functions currently don't.

       

      A hypothetical example to illustrate:

      import org.apache.spark.sql.functions.{col, split}
      
      val example = spark.createDataFrame(
          Seq(
            ("Doe, John", ", ", 2),
            ("Smith,Jane", ",", 2),
            ("Johnson", ",", 1)
          )
        )
        .toDF("name", "delim", "expected_parts_count")
      
      example.createOrReplaceTempView("test_data")
      
      // works for SQL
      spark.sql("SELECT split(name, delim, expected_parts_count) AS name_parts FROM test_data").show()
      
      // currently doesn't compile for scala, but easy to support
      example.withColumn("name_parts", split(col("name"), col("delim"), col("expected_parts_count"))).show() 

       

      Pretty simple patch that I can make a PR soon

      Attachments

        Issue Links

          Activity

            People

              liucao Liu Cao
              liucao Liu Cao
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: