[SPARK-25669] Check CSV header only when it exists - ASF JIRA

Rank to Top

Rank to Bottom

Attach files

Attach Screenshot

Bulk Copy Attachments

Bulk Move Attachments

Voters

Watch issue

Watchers

Create sub-task

Convert to sub-task

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.4.0
Fix Version/s: 2.4.0, 3.0.0
Component/s: SQL
Labels:
None

Description

Currently, Spark checks the header in CSV files to fields names in provided or inferred schema. The check is bypassed if the header doesn't exists and CSV content is read from files. In the case, when input CSV comes as dataset of strings, Spark always compares the first row to the user specified or inferred schema. For example, parsing the following dataset:

val input = Seq("1,2").toDS()
spark.read.option("enforceSchema", false).csv(input)

throws the exception:

java.lang.IllegalArgumentException: CSV header does not conform to the schema.
 Header: 1, 2
 Schema: _c0, _c1
Expected: _c0 but found: 1

Need to prevent comparison of the first row (if it is not a header) to specific or inferred schema.

Attachments

Issue Links

Add Link

is related to

SPARK-27873 Csv reader, adding a corrupt record column causes error if enforceSchema=false

Resolved

Delete this link

links to

[Github] Pull Request #22656 (MaxGekk)

Delete this link

[Github] Pull Request #22656 (MaxGekk)

Delete this link

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Max Gekk

Reporter:: Max Gekk

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 06/Oct/18 12:59

Updated:: 12/Dec/22 18:11

Resolved:: 09/Oct/18 06:38

Agile

View on Board

Check CSV header only when it exists

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Agile

Slack

Issue deployment