Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26351

Documented formula of precision at k does not match the actual code

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.4.0
    • 2.3.3, 2.4.1, 3.0.0
    • Documentation, MLlib
    • None

    Description

      The formula of the precision @ k for measuring the quality of the recommendations:

      https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html#ranking-systems

      says that j goes from 0 to min(|D|, k) , but according to the code, 

      https://github.com/apache/spark/blob/a63e7b2a212bab94d080b00cf1c5f397800a276a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L65

       

      val n = math.min(pred.length, k)

       

      The notation of Spark documentation defines

      D_i as the set of ground truth relevant documents for user i

      R_i as the set of recommended documents (i.e. predictions) given for user i .

      According to the code, the documentation should say j goes from 0 to min( | R_i |, k )

      Attachments

        Issue Links

          Activity

            People

              shahid shahid
              olbapjose Pablo J. Villacorta
              Sean Owen Sean Owen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: