Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28594 Allow event logs for running streaming apps to be rolled over
  3. SPARK-22783

event log directory(spark-history) filled by large .inprogress files for spark streaming applications

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 1.6.0, 2.1.0
    • None
    • Spark Core
    • None
    • Linux(Generic)

    Description

      When running long running streaming applications, the HDFS storage gets filled up with large *.inprogress files in hdfs://spark-history/ directory

      For example:

      hadoop fs -du -h /spark-history

      234 /spark-history/<Application_1_ID>.inprogress

      46.6 G /spark-history/<Application_2_ID>.inprogress

      Instead of continuing to write to a very large (multi GB) .inprogress file, Spark should instead rotate the current log file when it reaches a size (for example: 100 MB) or interval

      and perhaps expose a configuration parameter for the size/interval.

      This is also mentioned in SPARK-12140 as a concern.

      It is very important and useful to support rotating the log files because users may have limited HDFS quota and these large files consume the available limited quota.

      Also the users do not have a viable workaround

      1) Can not move the files to an another location because the moving the file causes the event logging to stop

      2) Trying to copy the .inprogress file to another location and truncate the .inprogress file fails because the file is still opened by EventLoggingListener for writing

      hdfs dfs -truncate -w 0 /spark-history/<application_id>.inprogress
      truncate: Failed to TRUNCATE_FILE /spark-history/<application_id>.inprogress for DFSClient_NONMAPREDUCE_<#ID>on <IP> because this file lease is currently owned by DFSClient_NONMAPREDUCE_<#ID> on <IP>

      The only workaround available is to disable the event logging for streaming applications by setting "spark.eventLog.enabled" to false

      Attachments

        Activity

          People

            Unassigned Unassigned
            omkar.kankalapati omkar kankalapati
            Votes:
            5 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: