Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31032

GMM compute summary and update distributions in one pass

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.1.0
    • 3.1.0
    • ML
    • None

    Description

      In current impl, GMM need to trigger two jobs at one iteration:

      1, one to compute summary;

      2, if {{shouldDistributeGaussians}} ((k - 1.0) / k) * numFeatures > 25.0,

      trigger another to update distributions;


       

      shouldDistributeGaussians is almost true in practice, since numFeatures is likely to be greater than 25.


       

      We can use only one job to impl above computation,

      Attachments

        Issue Links

          Activity

            People

              podongfeng Ruifeng Zheng
              podongfeng Ruifeng Zheng
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: