Uploaded image for project: 'Commons IO'
  1. Commons IO
  2. IO-781

CharSequenceInputStream.available() returns too large numbers in some cases

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.11.0
    • 2.16.0
    • Streams/Writers
    • None

    Description

      Description

      The available() method of org.apache.commons.io.input.CharSequenceInputStream erroneously returns values larger than the actual number of available bytes in some cases.

      The underlying issue is that CharSequenceInputStream makes incorrect assumptions about the relation between chars and bytes. The CodingErrorAction.REPLACE can convert 2 chars (1 supplementary code point) to one byte (the replacement char ?). Additionally in case CharSequenceInputStream is ever extended to support specifying a CharsetEncoder, the CodingErrorAction.IGNORE would probably cause similar issues. There might also be some uncommon charsets which can encode 2 chars to 1 byte; though I am not aware of such charset yet.

      This was originally mentioned in pull request #293. That PR also proposed to replace the underlying CharSequenceInputStream implementation with ReaderInputStream because in general using CharsetEncoder is error-prone so it might be good to avoid having two classes implementing logic on top of it. (Potentially CharSequenceInputStream is missing a call to CharsetEncoder.flush, see also IO-714)

      Example

      In the example below available() erroneously returns 2 even though only 1 byte can be read.

      Charset charset = Charset.forName("Big5");
      CharSequenceInputStream in = new CharSequenceInputStream("\uD800\uDC00", charset);
      // BUG: available() returns 2 but only 1 byte is read afterwards
      System.out.println("Available: " + in.available());
      // Note: readAllBytes() is a method added in Java 9
      System.out.println("Actually read: " + in.readAllBytes().length);
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            Marcono1234 Marcono1234
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: