Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37781

Java Out-Of-Memory Error when retrieving value from dataframe

    XMLWordPrintableJSON

Details

    • Question
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 3.1.2
    • None
    • Java API, Spark Submit, SQL
    • None

    Description

      My submitted spark application keeps running into the following error:

      Exception in thread "RemoteBlock-temp-file-clean-thread" java.lang.OutOfMemoryError: Java heap space
      	at org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager$$Lambda$751/0x0000000840662040.get$Lambda(Unknown Source)
      	at java.base/java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(DirectMethodHandle$Holder)
      	at java.base/java.lang.invoke.Invokers$Holder.linkToTargetMethod(Invokers$Holder)
      	at org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager.org$apache$spark$storage$BlockManager$RemoteBlockDownloadFileManager$$keepCleaning(BlockManager.scala:2036)
      	at org.apache.spark.storage.BlockManager$RemoteBlockDownloadFileManager$$anon$2.run(BlockManager.scala:2002)
      Exception in thread "main" java.lang.reflect.InvocationTargetException
      	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
      	at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65)
      	at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
      Caused by: java.lang.OutOfMemoryError: Java heap space
      	at scala.collection.immutable.HashSet$HashTrieSet.updated0(HashSet.scala:551)
      	at scala.collection.immutable.HashSet.$plus(HashSet.scala:84)
      	at scala.collection.immutable.HashSet.$plus(HashSet.scala:35)
      	at scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:28)
      	at scala.collection.mutable.SetBuilder.$plus$eq(SetBuilder.scala:24)
      	at scala.collection.generic.Growable.$anonfun$$plus$plus$eq$1(Growable.scala:62)
      	at scala.collection.generic.Growable$$Lambda$9/0x0000000840063840.apply(Unknown Source)
      	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
      	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
      	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
      	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
      	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
      	at scala.collection.mutable.SetBuilder.$plus$plus$eq(SetBuilder.scala:24)
      	at scala.collection.TraversableLike.to(TraversableLike.scala:678)
      	at scala.collection.TraversableLike.to$(TraversableLike.scala:675)
      	at scala.collection.AbstractTraversable.to(Traversable.scala:108)
      	at scala.collection.TraversableOnce.toSet(TraversableOnce.scala:309)
      	at scala.collection.TraversableOnce.toSet$(TraversableOnce.scala:309)
      	at scala.collection.AbstractTraversable.toSet(Traversable.scala:108)
      	at org.apache.spark.sql.catalyst.trees.TreeNode.containsChild$lzycompute(TreeNode.scala:122)
      	at org.apache.spark.sql.catalyst.trees.TreeNode.containsChild(TreeNode.scala:122)
      	at org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$1(TreeNode.scala:270)
      	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$withNewChildren$4(TreeNode.scala:283)
      	at org.apache.spark.sql.catalyst.trees.TreeNode$$Lambda$2239/0x0000000840e8c040.apply(Unknown Source)
      	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
      	at scala.collection.TraversableLike$$Lambda$17/0x000000084012e840.apply(Unknown Source)
      	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
      	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
      	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
      	at scala.collection.TraversableLike.map(TraversableLike.scala:238)
      	at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
      	at scala.collection.AbstractTraversable.map(Traversable.scala:108)
      12-29-2021 12:13:28 PM ERROR Utils: uncaught error in thread Spark Context Cleaner, stopping SparkContext
      java.lang.OutOfMemoryError: Java heap space
      12-29-2021 12:13:28 PM ERROR Utils: throw uncaught fatal error in thread Spark Context Cleaner
      java.lang.OutOfMemoryError: Java heap space
      Exception in thread "Spark Context Cleaner" java.lang.OutOfMemoryError: Java heap space

       

      A dataframe is created from a JDBC query to a Postgres database

       

      var dataframeVariable = sparkSession.read 
                                .format("jdbc")
                                       .option("url", urlVariable)
                                       .option("driver", driverVariable)
                                       .option("user", usernameVariable)
                                       .option("password", passwordVariable)
                                       .option("query", "select max(timestamp) as timestamp from \"" + tableNameVariable + "\"")
                                       .load()
      

       

      The error occurs when the program tries to extract a value from the dataframe. The dataframe contains only a single row and column. Here are the methods that I have used but have resulted in the application hanging and eventually getting the OOM error.

      var lastTimestamp = dataframeVariable().getDouble(0)
      var timeStampVal = dataframeVariable(col("timestamp")).collect()

       

      After some looking around, several people suggested changing the spark configurations for memory management to address this issues but I am not sure where to start in regards to that. Any guidance would be helpful. 

       

      Currently using: Spark 3.1.2, Scala 2.12, Java 11

      Spark Cluster Spec: 8 workers, 48 cores, 64GB Memory

      Application Submitted Spec: 1 worker, 4 driver and executor cores, 4GB driver and executor memory

      Attachments

        Activity

          People

            Unassigned Unassigned
            thinhnguyen Thinh Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: