In binary metrics, a threshold means any instance with a score >= threshold will be considered as positive.
However, in the existing implementation:
- When `numBins` is set when creating a `BinaryClassificationMetrics` object, all records (ordered by scores in DESC) will be grouped into chunks.
- In each chunk, statistics (in `BinaryLabelCounter`) of records are accumulated while the first record's score (also the largest) is selected as threshold.
- All these generated/sampled records form a new smaller data set to calculate binary metrics.
At the second step, it brings the BUG that the score/threshold of a record is correlated with wrong values like larger `true positive`, smaller `false negative` when calculating `recallByThresholds`, `precisionByThresholds`, etc.
Thus, the BUG fix is straightfoward. Let's pick up the last records's core in all chunks as thresholds while statistics merged.