Description
I have trained ML model and big data table in parquet. I want add new column to this table with predicted values. I can't lose any data, but can having null values in it.
RFormulaModel.fit() method creates new StringIndexer with default (handleInvalid="error") parameter. Also VectorAssembler on NULL values throwing Exception. So I must call df.na.drop() to transform this DataFrame and I don't want to do this.
Need add to RFormula new parameter like handleInvalid in StringIndexer.
Or add transform(Seq<Column>): Vector method which user can use as UDF method in df.withColumn("predicted", functions.callUDF(rFormulaModel::transform, Seq<Column>))
Attachments
Issue Links
- is related to
-
SPARK-23562 RFormula handleInvalid should handle invalid values in non-string columns.
- Resolved
- relates to
-
SPARK-20307 SparkR: pass on setHandleInvalid to spark.mllib functions that use StringIndexer
- Resolved