[CB-13570] FileReader#readAsText fails with multi-byte UTF-8 characters - ASF JIRA

Agile Board

Attach files

Attach Screenshot

Add vote

Voters

Stop watching

Watchers

Link

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 5.0.0, 4.2.0
Fix Version/s: None
Component/s: cordova-plugin-file
Labels:
None
Environment:
Hide

Tested on:

iOS 10.2

cordova-ios 4.3.0

UIWebView (not WKWebView)

cordova 6.4.0

cordova-plugin-file 4.2.0 and 5.0.0

(Slightly old cordova version, but the issue seems to be in the plugin)
Show
Tested on: iOS 10.2 cordova-ios 4.3.0 UIWebView (not WKWebView) cordova 6.4.0 cordova-plugin-file 4.2.0 and 5.0.0 (Slightly old cordova version, but the issue seems to be in the plugin)

Description

`FileReader#readAsText` reads the file in chunks of 256KB. If the file contains a multi-byte UTF-8 character that is split into two separate chunks, reading fails with an encoding error (ENCODING_ERR: 5).

For many apps this is not an issue. However, if I file is larger than 256KB and contains many multi-byte characters, this is likely to happen.

I have not experienced this issue on Android yet.

Code that demonstrates the issue: https://gist.github.com/anonymous/0fdc1ec212be1e29309820477257a0c3

In the example, the reading will split the '\u0153' character into '...\x01' and '\x53', which fails to decode in UTF-8.

A workaround is to use readAsArrayBuffer instead, and do the decoding in JavaScript. However, the decoding can be quite slow on iOS where a native TextDecoder is not available.

One solution would be to make the chunk sizes semi-flexible, to ensure that it ends on a character boundary (make the chunk larger until decoding succeeds).