Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.5.1
Description
SimpleDataSourceStreamReader is a simplified version of the DataStreamReader interface.
- It doesn’t require developers to reason about data partitioning.
- It doesn’t require getting the latest offset before reading data.
There are 3 functions that needs to be defined
1. Read data and return the end offset.
def read(self, start: Offset) -> (Iterator[Tuple], Offset)
2. Read data between start and end offset, this is required for exactly once read.
def read2(self, start: Offset, end: Offset) -> Iterator[Tuple]
3. initial start offset of the streaming query.
def initialOffset() -> dict
Implementation: Wrap the SimpleDataSourceStreamReader instance in a DataSourceStreamReader internally and make the prefetching and caching transparent to the data source developer. The record prefetched in python process will be sent to JVM as arrow record batches.
Attachments
Issue Links
- links to