[SPARK-31187] Sort the whole-stage codegen debug output by codegenStageId - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.4.5, 3.0.0
Fix Version/s: 3.0.0
Component/s: SQL
Labels:
None

Description

Spark SQL's whole-stage codegen (WSCG) supports dumping the generated code to help with debugging. One way to get the generated code is through df.queryExecution.debug.codegen, or SQL explain codegen statement.

The generated code is currently printed without specific ordering, which can make debugging a bit annoying. This ticket tracks a minor improvement to sort the codegen dump by the codegenStageId, ascending.

After this change, the following query:

spark.range(10).agg(sum('id)).queryExecution.debug.codegen

will always dump the generated code in a natural, stable order.

The number of codegen stages within a single SQL query tends to be very small, most likely < 50, so the overhead of adding the sorting shouldn't be significant.

Attachments

Issue Links

links to

GitHub Pull Request #27955

Activity

People

Assignee:: Kris Mok

Reporter:: Kris Mok

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 19/Mar/20 06:28

Updated:: 19/Mar/20 11:57

Resolved:: 19/Mar/20 11:55