[SPARK-25704] Replication of > 2GB block fails due to bad config default - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.4.0
Fix Version/s: 2.4.0
Component/s: Spark Core
Labels:
None

Description

Replicating a block > 2GB currently fails because it tries to allocate a bytebuffer that is just a bit too large, due to a bad default config. This line:

ChunkedByteBuffer.fromFile(tmpFile, conf.get(config.MEMORY_MAP_LIMIT_FOR_TESTS).toInt)

MEMORY_MAP_LIMIT_FOR_TESTS defaults to Integer.MAX_VALUE, but unfortunately that is just a tiny bit too big. You'll see an exception like:

18/10/09 21:21:54 WARN server.TransportChannelHandler: Exception in connection from /172.31.118.153:53534
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
        at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
        at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
        at org.apache.spark.util.io.ChunkedByteBuffer$$anonfun$8.apply(ChunkedByteBuffer.scala:199)
        at org.apache.spark.util.io.ChunkedByteBuffer$$anonfun$8.apply(ChunkedByteBuffer.scala:199)
        at org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:87)
        at org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:75)
        at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2315)
        at org.apache.commons.io.IOUtils.copy(IOUtils.java:2270)
        at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2291)
        at org.apache.commons.io.IOUtils.copy(IOUtils.java:2246)
        at org.apache.spark.util.io.ChunkedByteBuffer$$anonfun$fromFile$1.apply$mcI$sp(ChunkedByteBuffer.scala:201)
        at org.apache.spark.util.io.ChunkedByteBuffer$$anonfun$fromFile$1.apply(ChunkedByteBuffer.scala:201)
        at org.apache.spark.util.io.ChunkedByteBuffer$$anonfun$fromFile$1.apply(ChunkedByteBuffer.scala:201)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
        at org.apache.spark.util.io.ChunkedByteBuffer$.fromFile(ChunkedByteBuffer.scala:202)
        at org.apache.spark.util.io.ChunkedByteBuffer$.fromFile(ChunkedByteBuffer.scala:184)
        at org.apache.spark.storage.BlockManager$$anon$1.onComplete(BlockManager.scala:454)

at least on my system, its just 2 bytes too big

> scala -J-Xmx4G
import java.nio.ByteBuffer
scala> ByteBuffer.allocate(Integer.MAX_VALUE)
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
  at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
  at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
  ... 30 elided

scala> ByteBuffer.allocate(Integer.MAX_VALUE - 1)
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
  at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
  at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
  ... 30 elided

scala> ByteBuffer.allocate(Integer.MAX_VALUE - 2)
res3: java.nio.ByteBuffer = java.nio.HeapByteBuffer[pos=0 lim=2147483645 cap=2147483645]

Workaround: Set to "spark.storage.memoryMapLimitForTests" something a bit smaller, eg. 2147483135 (that's Integer.MAX_VALUE - 512, just in case its a bit different on other systems).

This was introduced by ~~SPARK-25422~~. I'll file a PR shortly.

Attachments

Issue Links

is caused by

SPARK-25422 flaky test: org.apache.spark.DistributedSuite.caching on disk, replicated (encryption = on) (with replication as stream)

Resolved

is related to

SPARK-25827 Replicating a block > 2gb with encryption fails

Resolved

relates to

SPARK-25904 Avoid allocating arrays too large for JVMs

Resolved

links to

[Github] Pull Request #22705 (squito)

Activity

People

Assignee:: Imran Rashid

Reporter:: Imran Rashid

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 10/Oct/18 17:56

Updated:: 31/Oct/18 20:44

Resolved:: 19/Oct/18 17:58