Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.3.4
-
None
Description
AbfsOutputStream doesnt close the dataBlock object created for the upload.
What is the implication of not doing that:
DataBlocks has three implementations:
- ByteArrayBlock
- This creates an object of DataBlockByteArrayOutputStream (child of ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading the array.
- This gets GCed.
- ByteBufferBlock:
- There is a defined DirectBufferPool from which it tries to request the directBuffer.
- If nothing in the pool, a new directBuffer is created.
- the `close` method on the this object has the responsiblity of returning back the buffer to pool so it can be reused.
- Since we are not calling the `close`:
- The pool is rendered of less use, since each request creates a new directBuffer from memory.
- All the object can be GCed and the direct-memory allocated may be returned on the GC. What if the process crashes, the memory never goes back and cause memory issue on the machine.
- DiskBlock:
- This creates a file on disk on which the data-to-upload is written. This file gets deleted in startUpload().close().
startUpload() gives an object of BlockUploadData which gives method of `toByteArray()` which is used in abfsOutputStream to get the byteArray in the dataBlock.
Method which uses the DataBlock object: https://github.com/apache/hadoop/blob/fac7d26c5d7f791565cc3ab45d079e2cca725f95/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java#L298
Attachments
Issue Links
- causes
-
HADOOP-18940 ABFS: Remove commons IOUtils.close() from AbfsOutputStream
- Open
- links to