Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Spark currently can only parse JSON files that are JSON lines, i.e. each record has an entire line and records are separated by new line. In reality, a lot of users want to use Spark to parse actual JSON files, and are surprised to learn that it doesn't do that.
We can introduce a new mode (wholeJsonFile?) in which we don't split the files, and rather stream through them to parse the JSON files.
Attachments
Issue Links
- is duplicated by
-
SPARK-10840 SparkSQL doesn't work well with JSON
- Closed
-
SPARK-17969 I think it's user unfriendly to process standard json file with DataFrame
- Closed
-
SPARK-7366 Support multi-line JSON objects
- Closed
- is related to
-
SPARK-20980 Rename the option `wholeFile` to `multiLine` for JSON and CSV
- Resolved
- relates to
-
SPARK-16496 Add wholetext as option for reading text in SQL.
- Resolved
-
SPARK-7366 Support multi-line JSON objects
- Closed
- links to