Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35355

improve execution performance in insert...select...limit case

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Minor
    • Resolution: Unresolved
    • 3.0.0
    • 3.0.0
    • SQL
    • None

    Description

      In the case of `insert into...select...limit` , `CollectLimitExec` has better execution performance than `GlobalLimit` .

      Before:

      == Physical Plan ==
       Execute InsertIntoHadoopFsRelationCommand ...
       +- *(2) GlobalLimit 5
       +- Exchange SinglePartition, true, id=#39
       +- *(1) LocalLimit 5
       +- *(1) ColumnarToRow
       +- FileScan ...
      

      After:

      == Physical Plan ==
       Execute InsertIntoHadoopFsRelationCommand ...
       +- CollectLimit 5
       +- *(1) ColumnarToRow
       +- FileScan ....
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            kaifeiYi yikaifei
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: