Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18352 Parse normal, multi-line JSON files (not just JSON Lines)
  3. SPARK-18658

Writing to a text DataSource buffers one or more lines in memory

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 2.2.0
    • SQL
    • None

    Description

      The JSON and CSV writing paths buffer entire lines (or multiple lines) in memory prior to writing to disk. For large rows this is inefficient. It may make sense to skip the TextOutputFormat record writer and go directly to the underlying FSDataOutputStream, allowing the writers to append arbitrary byte arrays (fractions of a row) instead of a full row.

      Attachments

        Issue Links

          Activity

            People

              NathanHowell Nathan Howell
              NathanHowell Nathan Howell
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: