Description
problem summary:
a few spark builds are filling up the jenkins master's disk with millions of little log files as build artifacts.
currently, we have a raid10 array set up with 5.4T of storage. we're currently using 4.0T, 99.9% of which is spark unit test and junit logs.
the worst offenders, with more than 100G of disk usage per job, are:
193G ./Spark-1.6-Maven-with-YARN
194G ./Spark-1.5-Maven-with-YARN
205G ./Spark-1.6-Maven-pre-YARN
216G ./Spark-1.5-Maven-pre-YARN
387G ./Spark-Master-Maven-with-YARN
420G ./Spark-Master-Maven-pre-YARN
520G ./Spark-1.6-SBT
733G ./Spark-1.5-SBT
812G ./Spark-Master-SBT
i have attached a full report w/all builds listed as well.
each of these builds is keeping their build history for 90 days.
keep in mind that for each new matrix build, we're looking at another 200-500G per for the SBT/pre-YARN/with-YARN jobs.
a straw man, back of napkin estimate for spark 1.7 is 2T of additional disk usage.
on the hardware config side, we can move from raid10 to raid 5 and get ~3T additional storage. if we ditch raid altogether and put in bigger disks, we can get a total of 16-20T storage on master. another option is to have a NFS mount to a deep storage server. all of these options will require significant downtime.
quesitons:
- can we lower the number of days that we keep build information?
- there are other options in jenkins that we can set as well: max number of builds to keep, max # days to keep artifacts, max # of builds to keep w/artifacts
- can we make the junit and unit test logs smaller (probably not)