Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36919

Make BadRecordException serializable

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.2.0, 3.2.1, 3.3.0
    • 3.2.0, 3.1.3, 3.0.4
    • Spark Core
    • None

    Description

      Migrating a Spark application from 2.4.x to 3.1.x and finding a difference in the exception chaining behavior. In a case of parsing a malformed CSV, where the root cause exception should beĀ Caused by: java.lang.RuntimeException: Malformed CSV record, only the top level exception is kept, and all lower level exceptions and root cause are lost. Thus, when we callĀ ExceptionUtils.getRootCause on the exception, we still get itself.
      The reason for the difference is that RuntimeException is wrapped in BadRecordException, which has unserializable fields. When we try to serialize the exception from tasks and deserialize from scheduler, the exception is lost.
      This PR makes unserializable fields of BadRecordException transient, so the rest of the exception could be serialized and deserialized properly.

      Attachments

        Activity

          People

            adrianhu96 Tianhan Hu
            adrianhu96 Tianhan Hu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: