Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28120

RocksDB state storage

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Later
    • 3.0.0
    • None
    • Structured Streaming
    • None

    Description

      SPARK-13809 introduced a framework for state management for computing Streaming Aggregates. The default implementation was in-memory hashmap which was backed up in HDFS complaint file system at the end of every micro-batch.

      Current implementation suffers from Performance and Latency Issues. It uses Executor JVM memory to store the states. State store size is limited by the size of the executor memory. Also
      Executor JVM memory is shared by state storage and other tasks operations. State storage size will impact the performance of task execution

      Moreover, GC pauses, executor failures, OOM issues are common when the size of state storage increases which increases overall latency of a micro-batch

      RocksDb is an embedded DB which can provide major performance improvements. Other major streaming frameworks have rocksdb as default state storage.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              itsvikramagr Vikram Agrawal
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: