Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35357

Allow to turn off the normalization applied by static PageRank utilities

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.1.1
    • 3.2.0
    • GraphX
    • None

    Description

      Since SPARK-18847, static PageRank computations available in `PageRank.scala` are normalizing the sum of the ranks after the fixed number of iterations has completed, and there is no way for a developer to access the raw non normalized ranks values.

      Since SPARK-29877 one can run a fixed number of PageRank iterations starting from previous `preRankGraph`'s ranks.
      This nice feature open the door for interesting incremental algorithms, for example:
      "Run some initial pagerank iterations using `PageRank.runWithOptions` and then update the graph's edges and update the ranks with a call to `PageRank.runWithOptionsWithPreviousPageRank`, and so on...".

      This kind of algorithms would highly benefit (precision gain) from being allowed to manipulate directly the raw ranks (and not the normalized ones) in the case where the graph has a substantial proportion of sinks (vertices without outgoing edges).

      It would be nice to add a method's signature having a boolean that allows to turn off the automatic normalization run at the end of `PageRank.runWithOptions` and `PageRank.runWithOptionsWithPreviousPageRank`, making the developers free to apply the normalization only when they really need it.

      Attachments

        Activity

          People

            EnzoBnl bonnal-enzo
            EnzoBnl bonnal-enzo
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: