Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31187

Sort the whole-stage codegen debug output by codegenStageId

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.4.5, 3.0.0
    • 3.0.0
    • SQL
    • None

    Description

      Spark SQL's whole-stage codegen (WSCG) supports dumping the generated code to help with debugging. One way to get the generated code is through df.queryExecution.debug.codegen, or SQL explain codegen statement.

      The generated code is currently printed without specific ordering, which can make debugging a bit annoying. This ticket tracks a minor improvement to sort the codegen dump by the codegenStageId, ascending.

      After this change, the following query:

      spark.range(10).agg(sum('id)).queryExecution.debug.codegen
      

      will always dump the generated code in a natural, stable order.

      The number of codegen stages within a single SQL query tends to be very small, most likely < 50, so the overhead of adding the sorting shouldn't be significant.

      Attachments

        Issue Links

          Activity

            People

              rednaxelafx Kris Mok
              rednaxelafx Kris Mok
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: