Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-44585

Fix warning condition in MLLib RankingMetrics ndcgAk

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.4.1
    • 3.4.2, 3.5.0, 4.0.0
    • MLlib
    • None

    Description

      The implementation of nDCG evaluation in MLLib with relevance score (added in 3.4.0, see https://issues.apache.org/jira/browse/SPARK-39446 and pull request) implements the following warning when the input data isn't correct: "# of ground truth set and # of relevance value set should be equal, check input data"

       

      The logic for raising warnings is faulty at the moment: it raises a warning when the following conditions are both true:

      1. rel is empty
      2. lab.size and rel.size are not equal.

       

      With the current logic, RankingMetrics will:

      • raise incorrect warning when a user is using it in the "binary" mode (i.e. no relevance values in the input)
      • not raise warning (that could be necessary) when the user is using it in the "non-binary" model (i.e. with relevance values in the input)

       

      The logic should be to raise a warning should be:

      1. rel is not empty
      2. lab.size and rel.size are not equal.

       

      Attachments

        Activity

          People

            gvuillier Guilhem Vuillier
            gvuillier Guilhem Vuillier
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: