Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37063 SQL Adaptive Query Execution QA: Phase 2
  3. SPARK-35414

Completely fix the broadcast timeout issue in AQE

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 3.0.0, 3.0.1
    • None
    • SQL
    • None

    Description

      SPARK-33933 report a issue that in AQE, when the resources is limited, broadcast timeout could happened. 

      #31269 gives a partial fix by reorder newStages by class type to make sure BroadcastQueryState precede others when calling materialized(). However, it only guarantee that the order of task to be scheduled in normal circumstances, but, the guarantee is not strict since the submit of broadcast job and shuffle map job are in different thread.

      So we need a completely fix to avoid the edge case triggering broadcast timeout.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            zhongyu09 Yu Zhong
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment