Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23377

Bucketizer with multiple columns persistence bug

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 2.3.0
    • 2.3.0, 2.4.0
    • ML
    • None

    Description

      A Bucketizer with multiple input/output columns get "inputCol" set to the default value on write -> read which causes it to throw an error on transform. Here's an example.

      import org.apache.spark.ml.feature._
      
      val splits = Array(Double.NegativeInfinity, 0, 10, 100, Double.PositiveInfinity)
      val bucketizer = new Bucketizer()
        .setSplitsArray(Array(splits, splits))
        .setInputCols(Array("foo1", "foo2"))
        .setOutputCols(Array("bar1", "bar2"))
      
      val data = Seq((1.0, 2.0), (10.0, 100.0), (101.0, -1.0)).toDF("foo1", "foo2")
      bucketizer.transform(data)
      
      val path = "/temp/bucketrizer-persist-test"
      bucketizer.write.overwrite.save(path)
      val bucketizerAfterRead = Bucketizer.read.load(path)
      println(bucketizerAfterRead.isDefined(bucketizerAfterRead.outputCol))
      // This line throws an error because "outputCol" is set
      bucketizerAfterRead.transform(data)
      

      And the trace:

      java.lang.IllegalArgumentException: Bucketizer bucketizer_6f0acc3341f7 has the inputCols Param set for multi-column transform. The following Params are not applicable and should not be set: outputCol.
      	at org.apache.spark.ml.param.ParamValidators$.checkExclusiveParams$1(params.scala:300)
      	at org.apache.spark.ml.param.ParamValidators$.checkSingleVsMultiColumnParams(params.scala:314)
      	at org.apache.spark.ml.feature.Bucketizer.transformSchema(Bucketizer.scala:189)
      	at org.apache.spark.ml.feature.Bucketizer.transform(Bucketizer.scala:141)
      	at line251821108a8a433da484ee31f166c83725.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-6079631:17)
      
      

      Attachments

        Issue Links

          Activity

            People

              viirya L. C. Hsieh
              bago.amirbekian Bago Amirbekian
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: