Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12427

spark builds filling up jenkins' disk

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • None
    • Build
    • Important

    Description

      problem summary:

      a few spark builds are filling up the jenkins master's disk with millions of little log files as build artifacts.

      currently, we have a raid10 array set up with 5.4T of storage. we're currently using 4.0T, 99.9% of which is spark unit test and junit logs.

      the worst offenders, with more than 100G of disk usage per job, are:
      193G ./Spark-1.6-Maven-with-YARN
      194G ./Spark-1.5-Maven-with-YARN
      205G ./Spark-1.6-Maven-pre-YARN
      216G ./Spark-1.5-Maven-pre-YARN
      387G ./Spark-Master-Maven-with-YARN
      420G ./Spark-Master-Maven-pre-YARN
      520G ./Spark-1.6-SBT
      733G ./Spark-1.5-SBT
      812G ./Spark-Master-SBT

      i have attached a full report w/all builds listed as well.

      each of these builds is keeping their build history for 90 days.

      keep in mind that for each new matrix build, we're looking at another 200-500G per for the SBT/pre-YARN/with-YARN jobs.

      a straw man, back of napkin estimate for spark 1.7 is 2T of additional disk usage.

      on the hardware config side, we can move from raid10 to raid 5 and get ~3T additional storage. if we ditch raid altogether and put in bigger disks, we can get a total of 16-20T storage on master. another option is to have a NFS mount to a deep storage server. all of these options will require significant downtime.

      quesitons:

      • can we lower the number of days that we keep build information?
      • there are other options in jenkins that we can set as well: max number of builds to keep, max # days to keep artifacts, max # of builds to keep w/artifacts
      • can we make the junit and unit test logs smaller (probably not)

      Attachments

        1. graph.png
          27 kB
          Shane Knapp
        2. jenkins_disk_usage.txt
          4 kB
          Shane Knapp

        Activity

          People

            joshrosen Josh Rosen
            shaneknapp Shane Knapp
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: