Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25669

Check CSV header only when it exists

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.4.0
    • 2.4.0, 3.0.0
    • SQL
    • None

    Description

      Currently, Spark checks the header in CSV files to fields names in provided or inferred schema. The check is bypassed if the header doesn't exists and CSV content is read from files. In the case, when input CSV comes as dataset of strings, Spark always compares the first row to the user specified or inferred schema. For example, parsing the following dataset:

      val input = Seq("1,2").toDS()
      spark.read.option("enforceSchema", false).csv(input)
      

      throws the exception:

      java.lang.IllegalArgumentException: CSV header does not conform to the schema.
       Header: 1, 2
       Schema: _c0, _c1
      Expected: _c0 but found: 1   
      

      Need to prevent comparison of the first row (if it is not a header) to specific or inferred schema.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            maxgekk Max Gekk
            maxgekk Max Gekk
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment