Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27873

Csv reader, adding a corrupt record column causes error if enforceSchema=false

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.3
    • 2.4.4, 3.0.0
    • SQL
    • None

    Description

      In the Spark CSV reader If you're using permissive mode with a column for storing corrupt records then you need to add a new schema column corresponding to columnNameOfCorruptRecord.

      However, if you have a header row and enforceSchema=false the schema vs. header validation fails because there is an extra column corresponding to columnNameOfCorruptRecord.

      Since, the FAILFAST mode doesn't print informative error messages on which rows failed to parse there is no way other to track down broken rows without setting a corrupt record column.

      Attachments

        Issue Links

          Activity

            People

              viirya L. C. Hsieh
              mejran Marcin Mejran
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: