Uploaded image for project: 'Commons CSV'
  1. Commons CSV
  2. CSV-229

Allow byte position tracking in CSVParser

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Parser
    • None
    • Patch

    Description

      This is a patch which adds significant modifications to the ExtendedBufferedReader.

      The problem is that efficient CSV parsing requires byte positioning, not character positioning as currently provided.

      The cases where byte positioning is necessary:

      • Suspend/resume parsing
      • Pagination/split where a large CSV file is read in chunks using file positioning.

      I've found the ExtendedBufferedReader to be unable to manage bytes in its current state (relying on BufferedReader and characters), so instead I had to redesign/merge these two classes.

      This modification is what we use in our system, so I'm hoping to get it released (otherwise we have to deal with custom build of Commons CSV).

      Architecturally the solution might be incomplete, however it provides what I need - getBytePosition() from a CSVParser. The entire chain only works if you provide a Reader AND a charset!

      Attachments

        1. csv_bytes4.patch
          26 kB
          Serge P. Nekoval

        Activity

          People

            Unassigned Unassigned
            snekoval Serge P. Nekoval
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: