[BEAM-9451] Optimize translation when Schema information is available in Spark Structured Streaming runner - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: P3
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: runner-spark
Labels:
- structured-streaming

Description

Spark Structured Streaming runner supports Datasets that already have Schema information. This is used by Spark to optimize jobs (via Catalyst). This issue is to implement optimized translations of the transforms for the runner so we can benefit of the performance improvements internally done by Spark.

Notice that we also may need to map Beam's core internal representations like WindowedValue so we can have intermediary optimizations.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Ismaël Mejía

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 05/Mar/20 14:48

Updated:: 04/Jun/22 14:41