Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-7848

Add possibility to manage quantity of instances (threads) per worker in Python SDK

Details

    • Improvement
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • None
    • Not applicable
    • runner-dataflow
    • None
    • Python SDK
      ApacheBeam version==2.13.0
      worker_type==n1-standard-4

    Description

      I'm developing a streaming pipeline with big memory consumption in one of the PTransforms. 
      After some period after starting this pipeline fails without any specific logs (see attachment file)

      It looks like, that it happens because of OutOfMemory.

      It would be great to set a limit of threads that will be used in a single worker to control memory load.
      I found such option in JAVA SDK (--numberOfWorkerHarnessThreads), but in Python SDK it is absent

      Attachments

        1. Selection_042.png
          86 kB
          Severyn Parkhomenko

        Activity

          People

            Unassigned Unassigned
            pseveryn Severyn Parkhomenko
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: