Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11350

There is no best practice to handle warnings or messages produced by Executors in a distributed manner

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • None
    • None
    • Spark Core

    Description

      I looked around on the web and I couldn’t find any way to deal, in a distributed way with malformed/faulty records during computation. All I was able to find was the flatMap/Some/None technique + logging.
      I’m facing this problem because I have a processing algorithm that extracts more than one value from each record, but can fail in extracting one of those multiple values, and I want to keep track of them. Logging is not feasible because this “warning” happens so frequently that the logs would become overwhelming and impossibile to read.
      Since I have 3 different possible outcomes from my processing I modeled it with this class hierarchy:

      http://i.imgur.com/NIesYUm.png?1

      That holds result and/or warnings. Since Result implements Traversable it can be used in a flatMap, discarding all warnings and failure results, in the other hand, if we want to keep track of warnings, we can elaborate them and output them if we need.

      Attachments

        Activity

          People

            Unassigned Unassigned
            tmnd91 Antonio Murgia
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: