Details
Description
Spark SQL's whole-stage codegen (WSCG) supports dumping the generated code to help with debugging. One way to get the generated code is through df.queryExecution.debug.codegen, or SQL explain codegen statement.
The generated code is currently printed without specific ordering, which can make debugging a bit annoying. This ticket tracks a minor improvement to sort the codegen dump by the codegenStageId, ascending.
After this change, the following query:
spark.range(10).agg(sum('id)).queryExecution.debug.codegen
will always dump the generated code in a natural, stable order.
The number of codegen stages within a single SQL query tends to be very small, most likely < 50, so the overhead of adding the sorting shouldn't be significant.
Attachments
Issue Links
- links to