Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-35165

AdaptiveBatch Scheduler should not restrict the default source parallelism to the max parallelism set

    XMLWordPrintableJSON

Details

    Description

      Copy-pasting the reasoning mentioned on this discussion thread.

      Let me state why I think "jobmanager.adaptive-batch-scheduler.default-source-parallelism" should not be bound by the "jobmanager.adaptive-batch-scheduler.max-parallelism".

      •  Source vertex is unique and does not have any upstream vertices - Downstream vertices read shuffled data partitioned by key, which is not the case for the Source vertex
      • Limiting source parallelism by downstream vertices' max parallelism is incorrect
      • If we say for ""semantic consistency" the source vertex parallelism has to be bound by the overall job's max parallelism, it can lead to following issues:
        • High filter selectivity with huge amounts of data to read
        • Setting high "jobmanager.adaptive-batch-scheduler.max-parallelism" so that source parallelism can be set higher can lead to small blocks and sub-optimal performance.
        • Setting high "jobmanager.adaptive-batch-scheduler.max-parallelism" requires careful tuning of network buffer configurations which is unnecessary in cases where it is not required just so that the source parallelism can be set high.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              vsowrirajan Venkata krishnan Sowrirajan
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: