Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28575

Time lag between two consecutive spark actions using Spark 2.3.1

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 2.3.1
    • None
    • Scheduler, Spark Core
    • None

    Description

      Steps to reproduce:

      1. Read a directory(consisting of txt files) using spark context's wholetextfile method
      2. Perform transformation on the resultant paired rdd
      3. Perform an action(foreach) on each entry corresponding to each txt file
      4. Time lag can be seen between these actions in Spark UI.

      The action itself is not taking that much time. There is time lag between start time for each action(excluding the time taken by the job itself). Kindly refer to the attachments

      PS: This time lag is not seen when running the job in Spark 2.1.1

      Attachments

        1. spark_2.1_screenshot.PNG
          208 kB
          Kushal Mahajan
        2. spark_2.3_screenshot.PNG
          136 kB
          Kushal Mahajan

        Activity

          People

            Unassigned Unassigned
            kumahaja Kushal Mahajan
            Votes:
            2 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: