Feature #11009: Improve ISO building and testing throughput and latency
Decrease I/O load created by isotesters on lizard
Even with 23G of RAM per isotester, contrary to what I believed earlier (#8681), our isotesters are the biggest consumers of I/O write, and their
/tmp/TailsToaster ext4 volume seems to be at fault. I'm trying to mount a tmpfs on there, we'll see how it goes.
#3 Updated by intrigeri almost 3 years ago
/tmp/TailsToaster case is closed, it will be interesting to look at I/O on:
the other isotesterN-* LVs, in particular the one that hosts the 2 ISOs used for testing¶
Both ISOs are copied to
/var/lib/jenkins, that lives on the isotesterN-data LV. If we can afford giving 2.5GB of RAM to each isotester, then we can write these ISOs to
/tmp/TailsToaster and avoid the corresponding I/O load. I'm not convinced if it's worth 8*2.5GB = 20GB of RAM, even though these volumes are on top of our I/O write load. Spreading these volumes across over all our (SSD-backed) RAID arrays would be nice, though: the 1TB one is kinda overloaded compared to the 500GB one, and half of our isotesterN-data are on rotating drives.
the system that serves the ISOs used for testing (it might be that giving it enough RAM to keep the last released ISO in memory could help)¶
For all branches, the ISO image being tested is copied from jenkins.lizard (using the copy artifacts Jenkins plugin). This takes about 1.6 minutes.
For non-release-branches, an additional ISO image is retrieved over HTTP by isotesters from www.lizard, that itself gets it via NFS from jenkins.lizard. This takes 45-60 seconds. Go figure why it's faster, but anyway, this is not the point here.
Eventually the actual data comes from
/dev/lizard/jenkins-data in both cases. Indeed that's our biggest consumer of I/O read (https://munin.riseup.net/tails.boum.org/lizard.tails.boum.org/diskstats_throughput/index.html) once we exclude isotesterN-tmp (that are being replaced by tmpfs) and bitcoin (that should be dropped IMO, but this is OT here).
So in most cases, the data that needs to be read from disk (jenkins-data) is either the latest released ISO (that we should probably always keep in disk cache on jenkins.lizard, by giving the VM a bit more RAM), or an ISO that we just copied to jenkins.lizard and retrieve it soon after (that could also be in memory if we gave jenkins.lizard more RAM) => bumped jenkins.lizard from ~1.5G to ~2.7G of RAM, let's see how it goes.
#4 Updated by intrigeri almost 3 years ago
- % Done changed from 0 to 50
- Blueprint set to https://tails.boum.org/blueprint/hardware_for_automated_tests_take2/
Note to myself: get rid of the LVs if we stick to tmpfs in the end.
Done. I'll post my benchmarking results to the blueprint soonish. These results + Munin data convinces me that it's a good thing to back
/tmp/TailsToaster with a tmpfs on isotesters.
#7 Updated by intrigeri almost 3 years ago
- Assignee deleted (
- QA Check set to Ready for QA
Moved isotester[1-4]-data to from rotating drives to a SSD-backed PV, and left isotester[5-8]-data on the other SSD-backed PV. This should make isotester[1-4]-data faster, and will lower the load on the rotating drives, that now are basically dedicated to jenkins-data, which is good since that was the other thing we wanted to optimize here.
I'll check munin data in a month or so, to confirm that I'm done here.
#11 Updated by intrigeri almost 3 years ago
- Status changed from In Progress to Resolved
- Assignee deleted (
- % Done changed from 80 to 100
- QA Check changed from Ready for QA to Pass
Since I've done these changes ~1 months ago:
- disk throughput is substantially smaller on md1, md2 and md3
- isotesterN-* LVs are not among the top consumers of iops anymore
- isotesterN VMs are not on top of the top consumers of disk read/write throughput anymore (libvirt-blkstats Munin plugin)
So I call this a success.