Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42856

Inconsistent output from label propagation algorithm in Graph X due to tie-breaking logic in vertexProgram

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.1.0, 3.3.2
    • None
    • GraphX
    • None

    Description

      We are experiencing inconsistent output from the label propagation algorithm in Graph X. When we run the algorithm on the same input, we observe different outputs each time. This behavior is unexpected since the algorithm is not designed to be random, and we should be getting the same output for the same input.

      We suspect that the issue lies in the tie-breaking logic used by the vertexProgram when picking labels. Currently, the vertexProgram chooses the label with the maximum frequency, and in case of a tie, it selects the label that appears first. This logic does not handle tie cases correctly, resulting in different outputs for the same input.

      def vertexProgram(vid: VertexId, attr: Long, message: Map[VertexId, Long]): VertexId = {  
          if (message.isEmpty) attr else message.maxBy(_._2)._1
      }
      

      To solve this issue, we propose changing the tie-breaking logic to something like vertex ID. This change will ensure that the same label is always selected in case of a tie, resulting in consistent output from the algorithm for the same input.

      def vertexProgram(vid: VertexId, attr: Long, message: Map[VertexId, Long]): VertexId = {
          if (message.isEmpty) attr else message.maxBy{ case (key, value) => (value, key) }._1
      }
      

      Thanks!

      Attachments

        Activity

          People

            Unassigned Unassigned
            m.haghpanah Mohammadreza Haghpanah
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: