Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
1.20.0
Description
We are doing performance testing on Flink cdc 3.0 and found through the arthas profile that there is a significant performance bottleneck in the deserialization of row data. The main problem lies in the String. format in the BinaryRecordDataGenerator class, so we have made simple performance optimizations.
test environment:
- flink: 1.20-SNAPSHOT master
- flink-cdc: 3.2-SNAPSHOT master
- 1CU minicluster mode
source: type: mysql hostname: localhost port: 3308 username: root password: 123456 tables: test.user_behavior server-id: 5400-5404 #server-time-zone: UTC scan.startup.mode: earliest-offset debezium.poll.interval.ms: 10 sink: type: values name: Values Sink materialized.in.memory: false print.enabled: false pipeline: name: Sync MySQL Database to Values parallelism: 1
before optimization: 3.5w/s
Analyzing the flame chart, it can be found that approximately 24.45% of the time is spent on string.format.
after optimization: 5w/s
After optimization, 4.7%(extractBeforeDataRecord+extractAfterDataRecord) of the time is still spent on org/apache/flink/cdc/runtime/typeutils/BinaryRecordDataGenerator.<init>. Perhaps we can further optimize it.
Attachments
Attachments
Issue Links
- links to