Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28005

SparkRackResolver should not log for resolving empty list

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • Scheduler, Spark Core
    • None

    Description

      After SPARK-13704, SparkRackResolver generates an INFO message everytime is called with 0 arguments:

      https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/SparkRackResolver.scala#L73-L76

      That actually happens every 1s when there are no active executors, because of the repeated offers that happen as part of delay scheduling:
      https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L134-L139

      while this is relatively benign, its a pretty annoying thing to be logging at INFO level every 1 second.

      This is easy to reproduce – in spark-shell, with dynamic allocation, set log level to info, see the logs appear every 1 second. Then run something, see the msgs stop. After the executors timeout, see the msgs reappear.

      scala> :paste
      // Entering paste mode (ctrl-D to finish)
      
      sc.setLogLevel("info")
      Thread.sleep(5000)
      sc.parallelize(1 to 10).count()
      
      // Exiting paste mode, now interpreting.
      
      19/06/11 12:43:40 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
      19/06/11 12:43:41 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
      19/06/11 12:43:42 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
      19/06/11 12:43:43 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
      19/06/11 12:43:44 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
      19/06/11 12:43:45 INFO spark.SparkContext: Starting job: count at <pastie>:28
      19/06/11 12:43:45 INFO scheduler.DAGScheduler: Got job 0 (count at <pastie>:28) with 2 output partitions
      19/06/11 12:43:45 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (count at <pastie>:28)
      ...
      19/06/11 12:43:54 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 
      19/06/11 12:43:54 INFO scheduler.DAGScheduler: ResultStage 0 (count at <pastie>:28) finished in 9.548 s
      19/06/11 12:43:54 INFO scheduler.DAGScheduler: Job 0 finished: count at <pastie>:28, took 9.613049 s
      res2: Long = 10                                                                 
      
      scala> 
      ...
      19/06/11 12:44:56 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
      19/06/11 12:44:57 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
      19/06/11 12:44:58 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
      19/06/11 12:44:59 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
      ...
      

      Attachments

        Issue Links

          Activity

            People

              gsomogyi Gabor Somogyi
              irashid Imran Rashid
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: