Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31412 New Adaptive Query Execution in Spark SQL
  3. SPARK-29954

collect the runtime statistics of row count in map stage

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.0.0
    • None
    • Shuffle, Spark Core
    • None

    Description

      We need the row count info to more accurately estimate the data skew situation when too many duplicated data. This PR will collect the row count info in map stage.

      Attachments

        Activity

          People

            Unassigned Unassigned
            Jk_Self Ke Jia
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: