[SPARK-27867] RegressionEvaluator cache lastest RegressionMetrics to avoid duplicated computation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Not A Problem
Affects Version/s: 3.0.0
Fix Version/s: None
Component/s: ML
Labels:
None

Description

In most cases, given a model, we have to obtain multi metrics of it.

For examples, a regression model, we may need to obtain the R2, MAE and MSE.

However, current design of `Evaluator` do not support computing multi metrics at once.

In practice, we usually use RegressionEvaluator like this:

val evaluator = new RegressionEvaluator()


val r2 = evaluator.setMetricName("r2").evaluate(df)


val mae = evaluator.setMetricName("mae").evaluate(df)


val mse = evaluator.setMetricName("mse").evaluate(df)

However, current impl of RegressionEvaluator needs one pass of the whole input dataset to compute one metric. So, above example needs 3 passes.

This can be optimized since in {RegressionMetrics} all metrics can be computed at once.

If we cache the lastest inputs, and then if the next evaluate call keep the inputs (except the metricName), then we can directly obtain the metric from the internal intermediate summary.

Attachments

Issue Links

links to

GitHub Pull Request #24727

Activity

People

Assignee:: Unassigned

Reporter:: Ruifeng Zheng

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 28/May/19 09:45

Updated:: 10/Jun/19 09:51

Resolved:: 10/Jun/19 09:51