Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42596

[YARN] OMP_NUM_THREADS not set to number of executor cores by default

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.2
    • 3.2.4, 3.3.3, 3.4.0
    • PySpark, YARN
    • None

    Description

      Run this PySpark script with `spark.executor.cores=1`

      import os
      from pyspark.sql import SparkSession
      from pyspark.sql.functions import udf
      
      spark = SparkSession.builder.getOrCreate()
      
      var_name = 'OMP_NUM_THREADS'
      
      def get_env_var():
        return os.getenv(var_name)
      
      udf_get_env_var = udf(get_env_var)
      spark.range(1).toDF("id").withColumn(f"env_{var_name}", udf_get_env_var()).show(truncate=False)
      

      Output with release `3.3.2`:

      +---+-----------------------+
      |id |env_OMP_NUM_THREADS    |
      +---+-----------------------+
      |0  |null                   |
      +---+-----------------------+
      

      Output with release `3.3.0`:

      +---+-----------------------+
      |id |env_OMP_NUM_THREADS    |
      +---+-----------------------+
      |0  |1                      |
      +---+-----------------------+
      

      Attachments

        Issue Links

          Activity

            People

              jzhuge John Zhuge
              jzhuge John Zhuge
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: