Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.4.3
-
None
-
None
Description
The rdd data in mllib.feature.ChiSqSelector.fit() is used by an action in Statistics.chiSqTest(data) and other actions in the following code, but it is not persisted.
def fit(data: RDD[LabeledPoint]): ChiSqSelectorModel = { val chiSqTestResult = Statistics.chiSqTest(data).zipWithIndex val features = selectorType match { case ChiSqSelector.NumTopFeatures => chiSqTestResult .sortBy { case (res, _) => res.pValue } .take(numTopFeatures)
This issue is reported by our tool CacheCheck, which is used to dynamically detecting persist()/unpersist() api misuses.
Attachments
Issue Links
- duplicates
-
SPARK-29818 Missing persist on RDD
- Resolved