Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1580

[MLlib] ALS: Estimate communication and computation costs given a partitioner

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 1.1.0
    • MLlib
    • None

    Description

      It would be nice to be able to estimate the amount of work needed to solve an ALS problem. The chief components of this "work" are computation time--time spent forming and solving the least squares problems-and communication cost--the number of bytes sent across the network. Communication cost depends heavily on how the users and products are partitioned.

      We currently do not try to cluster users or products so that fewer feature vectors need to be communicated. This is intended as a first step toward that end---we ought to be able to tell whether one partitioning is better than another.

      Attachments

        Activity

          People

            tmyklebu Tor Myklebust
            tmyklebu Tor Myklebust
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: