[SPARK-29826] Missing persist on data in mllib.feature.ChiSqSelector.fit - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 2.4.3
Fix Version/s: None
Component/s: MLlib
Labels:
None

Description

The rdd data in mllib.feature.ChiSqSelector.fit() is used by an action in Statistics.chiSqTest(data) and other actions in the following code, but it is not persisted.

 
  def fit(data: RDD[LabeledPoint]): ChiSqSelectorModel = {
    val chiSqTestResult = Statistics.chiSqTest(data).zipWithIndex
    val features = selectorType match {
      case ChiSqSelector.NumTopFeatures =>
        chiSqTestResult
          .sortBy { case (res, _) => res.pValue }
          .take(numTopFeatures)

This issue is reported by our tool CacheCheck, which is used to dynamically detecting persist()/unpersist() api misuses.

Attachments

Issue Links

duplicates

SPARK-29818 Missing persist on RDD

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: IcySanwitch

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 10/Nov/19 13:01

Updated:: 10/Nov/19 19:21

Resolved:: 10/Nov/19 19:21