Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28242

DataStreamer keeps logging errors even after fixing writeStream output sink

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • 2.4.3
    • None
    • Structured Streaming
    • Hadoop 2.8.4

       

    Description

      I have been testing what happens to a running structured streaming that is writing to HDFS when all datanodes are down/stopped or all cluster is down (including namenode)

      So I created a structured stream from kafka to a File output sink to HDFS and tested some scenarios.

      We used a very simple streamings:

      spark.readStream()
      .format("kafka")
      .option("kafka.bootstrap.servers", "kafka.server:9092...")
      .option("subscribe", "test_topic")
      .load()
      .select(col("value").cast(DataTypes.StringType))
      .writeStream()
      .format("text")
      .option("path", "HDFS/PATH")
      .option("checkpointLocation", "checkpointPath")
      .start()
      .awaitTermination();

       

      After stopping all the datanodes the process starts logging the error that datanodes are bad.

      That's correct...

      2019-07-03 15:55:00 [spark-listener-group-eventLog] ERROR org.apache.spark.scheduler.AsyncEventQueue:91 - Listener EventLoggingListener threw an exception java.io.IOException: All datanodes [DatanodeInfoWithStorage[10.2.12.202:50010,DS-d2fba01b-28eb-4fe4-baaa-4072102a2172,DISK]] are bad. Aborting... at org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1530) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1465) at org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1237) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:657)
      

      The problem is that even after starting again the datanodes the process keeps logging the same error all the time.

      We checked and the WriteStream to HDFS recovered successfully after starting the datanodes and the output sink worked again without problems.

      I have been trying some different HDFS configurations to be sure it's not a client config related problem but with no clue about how to fix it.

      It seams that something is stuck indefinitely in an error loop.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            mcanes Miquel Canes
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: