[IO-781] CharSequenceInputStream.available() returns too large numbers in some cases - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.11.0
Fix Version/s: 2.16.0
Component/s: Streams/Writers
Labels:
None

Description

The available() method of org.apache.commons.io.input.CharSequenceInputStream erroneously returns values larger than the actual number of available bytes in some cases.

The underlying issue is that CharSequenceInputStream makes incorrect assumptions about the relation between chars and bytes. The CodingErrorAction.REPLACE can convert 2 chars (1 supplementary code point) to one byte (the replacement char ?). Additionally in case CharSequenceInputStream is ever extended to support specifying a CharsetEncoder, the CodingErrorAction.IGNORE would probably cause similar issues. There might also be some uncommon charsets which can encode 2 chars to 1 byte; though I am not aware of such charset yet.

This was originally mentioned in pull request #293. That PR also proposed to replace the underlying CharSequenceInputStream implementation with ReaderInputStream because in general using CharsetEncoder is error-prone so it might be good to avoid having two classes implementing logic on top of it. (Potentially CharSequenceInputStream is missing a call to CharsetEncoder.flush, see also ~~IO-714~~)

Example

In the example below available() erroneously returns 2 even though only 1 byte can be read.

Charset charset = Charset.forName("Big5");
CharSequenceInputStream in = new CharSequenceInputStream("\uD800\uDC00", charset);
// BUG: available() returns 2 but only 1 byte is read afterwards
System.out.println("Available: " + in.available());
// Note: readAllBytes() is a method added in Java 9
System.out.println("Actually read: " + in.readAllBytes().length);

Attachments

Issue Links

links to

GitHub Pull Request #525

GitHub Pull Request #537

Activity

People

Assignee:: Unassigned

Reporter:: Marcono1234

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 11/Sep/22 14:21

Updated:: 26/Dec/23 18:57

Resolved:: 23/Dec/23 02:46