Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-33315

Optimize memory usage of large StreamOperator

    XMLWordPrintableJSON

Details

    Description

      Some of our batch jobs are upgraded from flink-1.15 to flink-1.17, and TM always fail with java.lang.OutOfMemoryError: Java heap space.

       

      Here is a example: a hive table with a lot of data, and the HiveSource#partitionBytes is 281MB.

      After analysis, the root cause is that TM maintains the big object with 3 replicas:

      • Replica_1: SourceOperatorFactory (it's necessary for running task)
      • Replica_2: Temporarily generate the duplicate SourceOperatorFactory object.
        • It's introduced in FLINK-30536 (1.17), it's not necessary. (code link)
        • When creating a successor operator to a SourceOperator, the call stack is:
          • OperatorChain#createOperatorChain ->
          • wrapOperatorIntoOutput ->
          • getOperatorRecordsOutCounter ->
          • operatorConfig.getStreamOperatorFactory(userCodeClassloader)
        • It will generate the SourceOperatorFactory temporarily and just check whether it's SinkWriterOperatorFactory
      • Replica_3: The value of StreamConfig#SERIALIZEDUDF
        • It is used to generate SourceOperatorFactory.
        • Now the value is always maintained in heap memory.
        • However, after generating we can release it or store it in the disk if needed.
          • We can define a threshold, when the value size is less than threshold, the release strategy doesn't take effect.
        • If so, we can save a lot of heap memory.

      These three replicas use about 800MB of memory. Please note that this is just a subtask. Since each TM has 4 slots, it will run 4 HiveSources at the same time, so 12 replicas are maintained in the TM memory, it's about 3.3 GB.

      These large objects in the JVM cannot be recycled, causing TM to frequently OOM.

      This JIRA focus on optimizing Replica_2 and Replica_3.

       

       

       

       

      Attachments

        Activity

          People

            fanrui Rui Fan
            fanrui Rui Fan
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: