Uploaded image for project: 'Crunch (Retired)'
  1. Crunch (Retired)
  2. CRUNCH-564

Add support for using escape character same as open/close quote character

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Trivial
    • Resolution: Fixed
    • None
    • 0.14.0
    • Core

    Description

      As a user I would like to use CSVInputFormat to handle the CSV files following this RFC http://www.ietf.org/rfc/rfc4180.txt.

      Many developers use Apache StringEscapeUtils.escapeCsv( ) method to escape their CSVs. The method escapes the CSV following the RFC4180.

      https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringEscapeUtils.html

      The CSVLineReader throws exception in such a case. We can enhance the code to support the CSVs that use escape same as the quote characters.

      https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/io/text/csv/CSVLineReader.java#L152

      I would appreciate a comment, if someone has knowingly rejected the idea due to some technical limitation or a problem with allowing escape and quote as same characters. By the way Apache HAWQ seem to get around this issue somehow and reads such CSVs alright.

      Attachments

        Issue Links

          Activity

            People

              champgm mac champion
              aliiqbal Muhammad
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: