Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-33613

Python UDF Runner process leak in Process Mode

    XMLWordPrintableJSON

Details

    Description

      While working with PyFlink, we found that in Process Mode, the Python UDF process may leak after a failover of the job. It leads to a rising number of processes with their threads in the host machine, which eventually results in failure to create new threads.

       

      You can try to reproduce it with the attached test task `streamin_word_count.py`.

      (Note that the job will continue failover, and you can watch the process leaks by `ps -ef` on Taskmanager.

       

      Our test environment:

      • K8S Application Mode
      • 4 Taskmanagers with 12 slots/TM
      • Job's parallelism was set to 48 

      The udf process `pyflink.fn_execution.beam.beam_boot` should be consistence with slots of TM (12), but we found that there are 180 processes on one Taskmanager after several failovers.

      Attachments

        1. ps-ef.txt
          54 kB
          Yu Chen
        2. streaming_word_count-1.py
          4 kB
          Yu Chen

        Issue Links

          Activity

            People

              dianfu Dian Fu
              Yu Chen Yu Chen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: