Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.14.2
-
None
-
None
Description
First: apologies if this is a false alarm, since I'm going by my reading of the C++ library source code.
To try to understand whether the new MaxMessageSize setting is important for our (Apache Parquet) use case, I tried to go through the C++ library source code to understand how it's used exactly. (see the message I posted in THRIFT-5237)
My understanding is that there are two main facilities for checking against the max message size:
- TTransport::countConsumedMessageBytes(numBytes) raises if numBytes is greater than the remaining message size, otherwise decrements the remaining message size by numBytes
- TTransport::checkReadBytesAvailable(numBytes) also raises if numBytes is greater than the remaining message size, but doesn't otherwise update the remaining message size
In TBufferBase::read, the internal buffer pointer is bumped by len bytes; however, checkReadBytesAvailable is called and not countConsumedMessageBytes. This means that multiple calls to TBufferBase::read will iterate through buffer memory but never update the remaining message size. In the end, the max message size limit is never upholded, except if a single read is larger than that size.