Description
When I was running `Graphx.connectedComponents` with 20+ billion vertices and edges, I found that count is very slow.
object Pregel extends Logging { ... def apply[VD: ClassTag, ED: ClassTag, A: ClassTag] (...): Graph[VD, ED] = { ... // Maybe messages.isEmpty() is better than messages.count() var activeMessages = messages.count() // Loop var prevG: Graph[VD, ED] = null var i = 0 while (activeMessages > 0 && i < maxIterations) { ... activeMessages = messages.count() ... } ... g } // end of apply } // end of class Pregel
Maybe we only need an action operator here and active-messages are not empty, so we don’t need to use count, it’s better to use isEmpty. I verified it and it worked very well.