Uploaded image for project: 'Apache Cordova'
  1. Apache Cordova
  2. CB-13570

FileReader#readAsText fails with multi-byte UTF-8 characters

Agile BoardAttach filesAttach ScreenshotAdd voteVotersStop watchingWatchersLinkUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 5.0.0, 4.2.0
    • None
    • cordova-plugin-file
    • None

    Description

      `FileReader#readAsText` reads the file in chunks of 256KB. If the file contains a multi-byte UTF-8 character that is split into two separate chunks, reading fails with an encoding error (ENCODING_ERR: 5).

      For many apps this is not an issue. However, if I file is larger than 256KB and contains many multi-byte characters, this is likely to happen.

      I have not experienced this issue on Android yet.

      Code that demonstrates the issue: https://gist.github.com/anonymous/0fdc1ec212be1e29309820477257a0c3

      In the example, the reading will split the '\u0153' character into '...\x01' and '\x53', which fails to decode in UTF-8.

      A workaround is to use readAsArrayBuffer instead, and do the decoding in JavaScript. However, the decoding can be quite slow on iOS where a native TextDecoder is not available.

      One solution would be to make the chunk sizes semi-flexible, to ensure that it ends on a character boundary (make the chunk larger until decoding succeeds).

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            rkistner Ralf Kistner

            Dates

              Created:
              Updated:

              Slack

                Issue deployment