[SPARK-31042] Error in writing a pyspark streaming dataframe created from Kafka source to a csv file - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Invalid
Affects Version/s: 2.4.5
Fix Version/s: None
Component/s: PySpark, Structured Streaming
Labels:
None

Description

While writing a streaming dataframe created from Kafka source to a csv file gives following error in PySpark.

NOTE : The same streaming dataframe is getting displayed in the console.

sdf.writeStream.format("console").start().awaitTermination() // Working

sdf.writeStream\
.format("csv")\
.option("path", "C://output")\
.option("checkpointLocation", "C://Checkpoint")\
.outputMode("append")\
.start().awaitTermination() // Not working

Error
---------
*File "C:\Spark\python\pyspark\sql\utils.py", line 63, in deco
return f(*a, **kw)
File "C:\Spark\python\lib\py4j-0.10.7-src.zip\py4j\protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o63.awaitTermination.
: org.apache.spark.sql.streaming.StreamingQueryException: Expected e.g. {"topicA":

{"0":23,"1":-1}

,"topicB":{"0":-2}}, got

{"logOffset":1}

=== Streaming Query ===
Identifier: [id = 6718625c-489e-44c8-b273-0da3429e97a8, runId = b64887ba-ca32-499e-9ab5-f839fd44ec26]
Current Committed Offsets: {KafkaV2[Subscribe[test1]]: {"logOffset":1}}
Current Available Offsets: {KafkaV2[Subscribe[test1]]: {"logOffset":1}}

Current State: ACTIVE
Thread State: RUNNABLE*

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Suchintak Patnaik

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 04/Mar/20 18:48

Updated:: 12/Dec/22 18:10

Resolved:: 11/Mar/20 06:44