Project

General

Profile

Bug #12599

Feature #5630: Reproducible builds

Bug #12531: ISO builds on Jenkins are fragile since the migration to vagrant-libvirt

/var/lib/libvirt/images gets filled on isobuilders

Added by intrigeri over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Elevated
Assignee:
-
Category:
Continuous Integration
Target version:
Start date:
05/25/2017
Due date:
% Done:

100%

Feature Branch:
bugfix/12599-rake-task-removing-tails-builder-volumes,jenkins-jobs:bugfix/12599-rake-task-removing-tails-builder-volumes
Type of work:
Sysadmin
Blueprint:
Starter:
Affected tool:

Description

On isobuilder3, right now:

total 2.8G
drwx------ 2 root         root          16K May  9 14:04 lost+found/
-rwxr--r-- 1 libvirt-qemu libvirt-qemu 741M May 25 16:45 tails-builder-amd64-jessie-20170510-d6ed806417_vagrant_box_image_0.img*
-rw------- 1 libvirt-qemu libvirt-qemu 1.3G May 25 18:27 tails-builder-amd64-jessie-20170524-8cc1ccbade_default.img
-rwxr--r-- 1 libvirt-qemu libvirt-qemu 743M May 25 17:37 tails-builder-amd64-jessie-20170524-8cc1ccbade_vagrant_box_image_0.img*

... while https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_feature-stretch/22/console is running.


Related issues

Related to Tails - Bug #12579: reproducibly_build_Tails_ISO_* Jenkins job are broken Resolved 05/22/2017
Related to Tails - Feature #12002: Estimate hardware cost of reproducible builds in Jenkins Resolved 11/28/2016
Related to Tails - Bug #13302: /var/lib/libvirt/images sometimes gets filled on isobuilders, take 2 Resolved 06/30/2017
Blocks Tails - Bug #12637: Deploy rake libvirt volumes clean up task on all Jenkins build jobs Resolved 06/04/2017

Associated revisions

Revision 04f253d1 (diff)
Added by anonym over 2 years ago

Rakefile: also clean up the base box disk volume when destroying the domain.

We'll have an exact copy in ~/vagrant.d that will be restored each
build. So this makes builds a bit slower, but wastes less disk space,
which we need on Jenkins.

Refs: #12599

Revision 4e954097 (diff)
Added by anonym over 2 years ago

Rakefile: try to clean up all tails-builder VMs.

... not only the one currently tracked by Vagrant, whose reference
(vagrant/.vagrant) easily can get lost, resulting in residual VM
images occupying disk space.

Refs: #12599

Revision d27bb28a (diff)
Added by bertagaz over 2 years ago

Rakefile: really cleanup any existing volume in the storage pool.

Let just extract this from the domain loop, and simply remove every
volume the libvirt storage pool is aware of. That sound more robust.

Refs: #12599

Revision fe00baad (diff)
Added by bertagaz over 2 years ago

Rakefile: Fix name of one of the VM volumes to be removed.

Refs: #12599

Revision e2d5817c (diff)
Added by bertagaz over 2 years ago

Add a rake task to remove all existing libvirt volumes.

This will be a safety net useful in the Tails CI infra, in case the code
to remove any Tails builder VM volumes with forcecleanup fails at some
point (like skipped due to a power outage), and some volumes don't get
deleted, leading to filling the libvirt volume partition.

Refs: #12599

Revision dc8ff0da (diff)
Added by bertagaz over 2 years ago

Remove only tails-builder-* libvirt volumes.

Just to be sure a developer won't remove all her libvirt volumes by
mistake, and that the apt-caher-ng disk is still kept.

refs: #12599

Revision 2f4442c2 (diff)
Added by bertagaz over 2 years ago

Fix typo.

Refs: #12599

Revision 93e0aab6 (diff)
Added by bertagaz over 2 years ago

Update build documentation about the forcecleanup option changes.

Refs: #12599

Revision de6e7567 (diff)
Added by bertagaz over 2 years ago

Rakefile: Add task removing all tails-builder-* Libvirt volumes in the default pool.

Refs: #12599

Revision 94e99417
Added by intrigeri over 2 years ago

Merge remote-tracking branch 'origin/bugfix/12599-rake-task-removing-tails-builder-volumes' into testing (Fix-committed: #12599).

History

#1 Updated by intrigeri over 2 years ago

/dev/vdc                   ext4      2.9G  2.8G     0 100% /var/lib/libvirt/images

#2 Updated by bertagaz over 2 years ago

  • Related to Bug #12579: reproducibly_build_Tails_ISO_* Jenkins job are broken added

#3 Updated by bertagaz over 2 years ago

  • Blocks Feature #12576: Have Jenkins use basebox:clean_old instead of basebox:clean_all added

#4 Updated by intrigeri over 2 years ago

  • Blocks deleted (Feature #12576: Have Jenkins use basebox:clean_old instead of basebox:clean_all)

#5 Updated by intrigeri over 2 years ago

  • Blocks Feature #12576: Have Jenkins use basebox:clean_old instead of basebox:clean_all added

#6 Updated by intrigeri over 2 years ago

  • Parent task changed from #5630 to #12531

#7 Updated by anonym over 2 years ago

To essentially halve the disk space usage per base box, see the end of #12595#note-13.

#8 Updated by anonym over 2 years ago

  • Status changed from Confirmed to In Progress

Applied in changeset commit:69faba0c1d5e7b517616103f8e1c14528bdb55e8.

#9 Updated by intrigeri over 2 years ago

  • Feature Branch set to feature/12599

Rebased the topic branch on testing (devel FTBFS at the moment, and we'll want to merge this into testing anyway) and forced-pushed.

#10 Updated by intrigeri over 2 years ago

  • Assignee changed from bertagaz to anonym
  • % Done changed from 0 to 10

Sorry, I have bad news: this branch works neither on my system, nor on Jenkins: in both cases, I see several tails-builder-amd64-jessie-*_vagrant_box_image_0.img files in /var/lib/libvirt/images/. Looking at the code around old_domain =, it seems to me we're only cleaning up 1 old VM, not "any residual VM", no?

#11 Updated by intrigeri over 2 years ago

  • QA Check set to Info Needed

intrigeri wrote:

Sorry, I have bad news: this branch works neither on my system, nor on Jenkins: in both cases, I see several tails-builder-amd64-jessie-*_vagrant_box_image_0.img files in /var/lib/libvirt/images/. Looking at the code around old_domain =, it seems to me we're only cleaning up 1 old VM, not "any residual VM", no?

Wait, actually perhaps it's merely a bootstrapping issue than an algorithmic one: in theory, assuming this code had been deployed on all branches forever, and it runs fine every time, then cleaning up only the previous default Vagrant VM is enough to ensure we never keep any old VM images around. So I'd be happy to merge this and manually clean up the VM images that are not tracked as the default one by Vagrant, and then in theory we should be good. Correct so far? If we agree on this, then I'll check that this branch does what it's meant to be (instead of what I believed it would do :) and merge it right away! That'll already be a nice start :)

Now, I don't think we can rely on these assumptions on the long term for a production deployment: sometimes a VM will be rebooted at the exact wrong time, or there will be a power issue and everything will suddenly go down, or something. So the only robust enough way to ensure we have no such VM image around when we start a build is to delete all these images before building, either with a Rake task (that we would perhaps run on Jenkins only) or an additional build step in jenkins-jobs.git.

Thoughts, both of you?

#12 Updated by anonym over 2 years ago

  • Assignee changed from anonym to bertagaz

intrigeri wrote:

intrigeri wrote:

Sorry, I have bad news: this branch works neither on my system, nor on Jenkins: in both cases, I see several tails-builder-amd64-jessie-*_vagrant_box_image_0.img files in /var/lib/libvirt/images/. Looking at the code around old_domain =, it seems to me we're only cleaning up 1 old VM, not "any residual VM", no?

Wait, actually perhaps it's merely a bootstrapping issue than an algorithmic one: in theory, assuming this code had been deployed on all branches forever, and it runs fine every time, then cleaning up only the previous default Vagrant VM is enough to ensure we never keep any old VM images around. So I'd be happy to merge this and manually clean up the VM images that are not tracked as the default one by Vagrant, and then in theory we should be good. Correct so far?

Correct! It's

If we agree on this, then I'll check that this branch does what it's meant to be (instead of what I believed it would do :) and merge it right away! That'll already be a nice start :)

Now, I don't think we can rely on these assumptions on the long term for a production deployment: sometimes a VM will be rebooted at the exact wrong time, or there will be a power issue and everything will suddenly go down, or something. So the only robust enough way to ensure we have no such VM image around when we start a build is to delete all these images before building, either with a Rake task (that we would perhaps run on Jenkins only) or an additional build step in jenkins-jobs.git.

Thoughts, both of you?

Yeah, I agree. And I rather fix this now when I have these things remotely fresh in my head, so I've implemented it in the branch, 4e954097ad348f3b1bc9f93a8dedc5a0848c8d1e. This should cover us forever except if really weird stuff happens.

#13 Updated by intrigeri over 2 years ago

anonym wrote:

Correct! It's

Missing bits?

So, anonym, what's the next step then? bertagaz reviews your stuff, i.e. Ready for QA? Or you want to first check it works fine, and then reassign to you?

#14 Updated by bertagaz over 2 years ago

  • Assignee changed from bertagaz to anonym
  • QA Check changed from Info Needed to Dev Needed

intrigeri wrote:

So, anonym, what's the next step then? bertagaz reviews your stuff, i.e. Ready for QA? Or you want to first check it works fine, and then reassign to you?

I had a look already, ran it both in Jenkins and locally. I have two feedbacks:

First I think there's a flaw in this part of Rakefile:490:

#{domain.name}_vagrant_box_image_0.img

This part does not work: domain.name contains the string default (probably the libvirt storage pool name), whereas this particular volume name does not, so it does not get deleted. I've seen that indeed locally and in Jenkins. This volume still exists after the build. But the snapshot of this volume is correctly removed.

Second, as I get the code, it will loop over existing domain names. That's fine if at some point a previous build failed badly and the domain used at that time has not been undefined. But if it has been, but for some reason the related volumes have not been removed from the pool, then this old orphan volumes won't get deleted. So I guess the idea is, to quote intrigeri, to have "an additional build step in jenkins-jobs.git"? I wonder if we should not just loop over the existing volumes in the pool to delete them all, no matter the domain name. What do you think?

#15 Updated by intrigeri over 2 years ago

I've found another, unrelated reason why we eat more space in /var/lib/libvirt/images than we could: at the end of a successful build, we do mv -f tails-* "${TAILS_GIT_DIR}/", and $TAILS_GIT_DIR is not a tmpfs, so the tails-builder-*_default.img that's in use suddenly grows by ~1.2GB. This also causes I/O load we could happily live without, and increases build time a little bit. I guess we're doing it this way in order to ease the "Retrieving artifacts from Vagrant build box" step, and anything else that relies on list_artifacts. I see two ways to solve this (perhaps minor?) issue: either mount a tmpfs on $TAILS_GIT_DIR before doing git clone /amnesia.git/.git "${TAILS_GIT_DIR}"; or drop the mv command, and teach list_artifacts to look in the most recent /tmp/tails-build.* directory, instead of in /home/vagrant/amnesia, e.g. something like (untested!):

--- a/Rakefile
+++ b/Rakefile
@@ -312,8 +312,11 @@ end

 def list_artifacts
   user = vagrant_ssh_config('User')
-  stdout = capture_vagrant_ssh("find '/home/#{user}/amnesia/' -maxdepth 1 " +
-                                        "-name 'tails-*.iso*'").first
+  build_dirs_stdout = capture_vagrant_ssh("ls -1 -d --sort=time " +
+                                          "/tmp/tails-build.*").first
+  build_dir = build_dirs_stdout.split("\n")[0]
+  stdout = capture_vagrant_ssh("find '#{build_dir}' -maxdepth 1 " +
+                               "-name 'tails-*.iso*'").first
   stdout.split("\n")
 rescue VagrantCommandError
   return Array.new
--- a/vagrant/provision/assets/build-tails
+++ b/vagrant/provision/assets/build-tails
@@ -110,5 +110,3 @@ cd "${BUILD_DIR}" 
 as_root_do lb config --cache false

 as_root_do lb build
-
-mv -f tails-* "${TAILS_GIT_DIR}/" 

What do you folks think?

#16 Updated by bertagaz over 2 years ago

intrigeri wrote:

What do you folks think?

I think that's a good idea. :)

#17 Updated by anonym over 2 years ago

intrigeri wrote:

I've found another, unrelated reason why we eat more space in /var/lib/libvirt/images than we could: [...]

I'm aware of this but sadly your fix is not enough; build-tails will clean up the tmpfs used for the build before we return to the Rakefile and realize that there are no build artifacts. I once actually started working on a branch where build-tails was modularized enough so we could fetch the artifacts directly from the tmpfs before its teardown, but I never got finished it and now it's too outdated. Now I'm wondering if we perhaps should go a step further by moving translating all of build-tails into tasks in the Rakefile.

#18 Updated by intrigeri over 2 years ago

I'm aware of this but sadly your fix is not enough; build-tails will clean up the tmpfs used for the build before we return to the Rakefile and realize that there are no build artifacts.

OK, let's forget about this issue for now then: it's slightly OT on this ticket as the required space is a constant.
We can come back to it later, on another ticket, if needed (but as far as I'm concerned, I'll try hard not to spend any time on such optimizations before having looked at the big picture and identified low-hanging fruits).

#19 Updated by intrigeri over 2 years ago

  • Related to Feature #12002: Estimate hardware cost of reproducible builds in Jenkins added

#20 Updated by bertagaz over 2 years ago

intrigeri wrote:

I'm aware of this but sadly your fix is not enough; build-tails will clean up the tmpfs used for the build before we return to the Rakefile and realize that there are no build artifacts.

Oooh, good catch. :)

OK, let's forget about this issue for now then: it's slightly OT on this ticket as the required space is a constant.
We can come back to it later, on another ticket, if needed (but as far as I'm concerned, I'll try hard not to spend any time on such optimizations before having looked at the big picture and identified low-hanging fruits).

So what should we do about this ticket? Remove the blocker on #12576? Remove the Deliverable for and/or 3.0 target? Or just wait a bit before deciding?

#21 Updated by intrigeri over 2 years ago

So what should we do about this ticket?

I'm sorry I apparently confused you with my "another, unrelated reason why we eat more space". The comment you're replying to was about this slightly OT sub-discussion, not at all about the general problem this ticket is about. This ticket is high priority as it's currently the main cause of irrelevant build failures (#12531). I see no reason to delay fixing it. Once the topic branch is fixed and merged, case closed, no? bertagaz, would you be ready to take this topic branch over? anonym seems to be plenty busy with other high-priority matters.

Remove the blocker on #12576?

I don't know why this ticket blocks #12576 (because I don't know how clean_old and clean_all relate to /var/lib/libvirt/images, on current testing and on feature/12599).

#22 Updated by bertagaz over 2 years ago

  • Assignee changed from anonym to bertagaz

intrigeri wrote:

So what should we do about this ticket?

I'm sorry I apparently confused you with my "another, unrelated reason why we eat more space". The comment you're replying to was about this slightly OT sub-discussion, not at all about the general problem this ticket is about. This ticket is high priority as it's currently the main cause of irrelevant build failures (#12531). I see no reason to delay fixing it. Once the topic branch is fixed and merged, case closed, no?

Ooch right, sorry, I should have finished my coffee before replying.

bertagaz, would you be ready to take this topic branch over? anonym seems to be plenty busy with other high-priority matters.

Yes, let's do that.

Remove the blocker on #12576?

I don't know why this ticket blocks #12576 (because I don't know how clean_old and clean_all relate to /var/lib/libvirt/images, on current testing and on feature/12599).

Hmm, right, I've been confused here, will remove this.

#23 Updated by bertagaz over 2 years ago

  • Blocks deleted (Feature #12576: Have Jenkins use basebox:clean_old instead of basebox:clean_all)

#24 Updated by anonym over 2 years ago

bertagaz wrote:

intrigeri wrote:

So, anonym, what's the next step then? bertagaz reviews your stuff, i.e. Ready for QA? Or you want to first check it works fine, and then reassign to you?

I had a look already, ran it both in Jenkins and locally. I have two feedbacks:

First I think there's a flaw in this part of Rakefile:490:

[...]

This part does not work: domain.name contains the string default (probably the libvirt storage pool name), whereas this particular volume name does not, so it does not get deleted. I've seen that indeed locally and in Jenkins. This volume still exists after the build. But the snapshot of this volume is correctly removed.

Whoops! This explains intrigeri's problems reported in #12599#note-10, and we can rule out his "bootstrap" theory in #12599#note-10. I see you already have fixed this, and I had in parallel prepared this fix:

--- a/Rakefile
+++ b/Rakefile
@@ -487,7 +487,8 @@ def clean_up_builder_vms
     rescue Libvirt::RetrieveError
       # Expected if the pool does not exist
     else
-      for disk in ["#{domain.name}.img", "#{domain.name}_vagrant_box_image_0.img"] do
+      box_name = domain.name.sub(/_default$/, '')
+      for disk in ["#{domain.name}.img", "#{box_name}_vagrant_box_image_0.img"] do
         begin
           pool.lookup_volume_by_name(disk).delete
         rescue Libvirt::RetrieveError

As I noted during our CI meeting today, your fix, d27bb28a2474748123b5c61158657e779ba229bd, needs to match disk names to /^tails-builder/ or similar before deleting, otherwise it will delete all my VMs' disks! :)

Second, as I get the code, it will loop over existing domain names. That's fine if at some point a previous build failed badly and the domain used at that time has not been undefined. But if it has been, but for some reason the related volumes have not been removed from the pool, then this old orphan volumes won't get deleted.

Correct. My reasoning is that orphan volumes should be impossible; since we deal with the volume removal immediately after we undefine the domain there's not really anything that can go wrong except that we fail to remove the volumes. Put a different way: such a failure would affect any other approach we take (e.g. the one you mention below) just the same.

So I guess the idea is, to quote intrigeri, to have "an additional build step in jenkins-jobs.git"?

I think this is moot now. We want to do the same cleanup on all builders, whether on Jenkins or any contributor's system.

I wonder if we should not just loop over the existing volumes in the pool to delete them all, no matter the domain name. What do you think?

So this is what you d27bb28a2474748123b5c61158657e779ba229bd does. For the reasons I mentioned above I don't think this will be more reliable than my fix for the old approach, but I'm not against this approach (as long as it matches only tails-builder disks, and not all disks).

#25 Updated by bertagaz over 2 years ago

  • Assignee changed from bertagaz to anonym
  • QA Check changed from Dev Needed to Info Needed

anonym wrote:

As I noted during our CI meeting today, your fix, d27bb28a2474748123b5c61158657e779ba229bd, needs to match disk names to /^tails-builder/ or similar before deleting, otherwise it will delete all my VMs' disks! :)

Ooch, right. I've reverted my commit, and adapted yours as discussed on the chat yesterday.

Correct. My reasoning is that orphan volumes should be impossible; since we deal with the volume removal immediately after we undefine the domain there's not really anything that can go wrong except that we fail to remove the volumes. Put a different way: such a failure would affect any other approach we take (e.g. the one you mention below) just the same.

So I guess the idea is, to quote intrigeri, to have "an additional build step in jenkins-jobs.git"?

I think this is moot now. We want to do the same cleanup on all builders, whether on Jenkins or any contributor's system.

I wonder if we should not just loop over the existing volumes in the pool to delete them all, no matter the domain name. What do you think?

So this is what you d27bb28a2474748123b5c61158657e779ba229bd does. For the reasons I mentioned above I don't think this will be more reliable than my fix for the old approach, but I'm not against this approach (as long as it matches only tails-builder disks, and not all disks).

Well, that's a bit conflicting with what intrigeri pointed at the end of #12599#note-11. I agree we need a way to cleanup which is the same in our infra than for the devs, but OTOH, I think we need a safety net in case something goes wrong in Jenkins. The situation is not really the same: while a developer will simply notice it and remove the unnecessary volumes by hand if it ever happens, as sysadmins I think we'd be glad if they get deleted automatically at the beginning of the next build because something like a power outage came in play at the wrong moment.

So I've also added a rake task in e2d5817c7c03fe0aaa994aacac01cff57fa8d99a that removes all libvirt volumes, that we'll be able to use in our infra. I think with both code we'd have now, we'll be far more at ease on the Jenkins side.

Does it make sense to you? Care to have a quick look if that rake task code looks good to you? Note that I've already tested it locally.

#26 Updated by anonym over 2 years ago

  • Assignee changed from anonym to bertagaz
  • QA Check deleted (Info Needed)

Sure, I didn't think of power outages... :)

But let's not ship a rake task that can accidentally destroy all VM images! Please only delete disks matching /^tails-builder-/!

In fact, why making this a rake task at all when it is code that no one but Jenkins needs? Can't we just run the following one-liner on Jenkins before each build?

 
virsh -q vol-list default | awk '{ print $1 }' | grep '^tails-builder-' | xargs -n1 virsh vol-delete --pool default

#27 Updated by bertagaz over 2 years ago

  • Assignee changed from bertagaz to anonym
  • QA Check set to Info Needed

anonym wrote:

Sure, I didn't think of power outages... :)

:)

But let's not ship a rake task that can accidentally destroy all VM images! Please only delete disks matching /^tails-builder-/!

Fair enough, I'll add this condition. I thought about adding one that would avoid doing anything if not ran by Jenkins, but in the end for people like me that use a dedicated computer, it'll be probably useful.

In fact, why making this a rake task at all when it is code that no one but Jenkins needs? Can't we just run the following one-liner on Jenkins before each build? [...]

I think I'm more confident to have this code in the Rakefile rather than done by a shitty "${shell}", don't know why... :) Also because it can be reused elsewhere.

#28 Updated by bertagaz over 2 years ago

  • Assignee changed from anonym to bertagaz
  • QA Check changed from Info Needed to Dev Needed

bertagaz wrote:

anonym wrote:

But let's not ship a rake task that can accidentally destroy all VM images! Please only delete disks matching /^tails-builder-/!

Fair enough, I'll add this condition. I thought about adding one that would avoid doing anything if not ran by Jenkins, but in the end for people like me that use a dedicated computer, it'll be probably useful.

Done, just pushed a commit implementing that. I noticed that it may be helpful also in case people are using the internal apt-cacher-ng partition. In that case, they'll be able to use it AND use this rake task to clean their Libvirt storage from any remaining tails-build-* volume.

Not sure we need another QA round, given how simple it is. I've already tested it locally, and will see how it goes in Jenkins. If it goes well, I'll merge that into testing.

#29 Updated by intrigeri over 2 years ago

Not sure we need another QA round, given how simple it is. I've already tested it locally, and will see how it goes in Jenkins. If it goes well, I'll merge that into testing.

Given the destructive effects simple mistakes / lack of robustness in this code may have, I'd rather see someone (anonym or I) review it before it's run on innocent bystanders' systems. (I understood the new code won't be intentionally run by default, but rake<TAB><TAB>… mistakes are very easy to make – pun intended –, at least with zsh.)

#30 Updated by intrigeri over 2 years ago

  • It seems to me that this branch changes the behavior of forcecleanup, so its doc is now outdated, no? Or are (some of) the modifications to clean_up_builder_vms leftovers that are now useless given we have clean_up_libvirt_volumes? I'll now re-read the discussion above to better understand the context.
  • Typo in tails-buidler

#31 Updated by intrigeri over 2 years ago

After re-reading the conversation, it seems to me that the clean_up_libvirt_volumes new Rake target is enough to address the problem we're facing, and it superseds the changes made earlier in clean_up_builder_vms on the topic branch. Correct?

So I'd rather merge a branch that does only that, so unless there's a good reason I've missed, let's please create a new branch with only the needed commits (good time to rebase -i and clean up history a bit BTW?). And if the changes done in clean_up_builder_vms are useful to solve another problem, please create another ticket about it, pointing to feature/12599. What do you think about this plan?

#32 Updated by bertagaz over 2 years ago

  • Assignee changed from bertagaz to anonym
  • QA Check changed from Dev Needed to Info Needed

intrigeri wrote:

After re-reading the conversation, it seems to me that the clean_up_libvirt_volumes new Rake target is enough to address the problem we're facing, and it superseeds the changes made earlier in clean_up_builder_vms on the topic branch. Correct?

Well, yes and no. clean_up_builder_vms is also there to ensure that the VM is destroyed and undefined, and is targeting both the Jenkins and the developer use case (the one that use @forcecleanup at least). The new rake task is just there to ensure that no volumes are left overs and the Libvirt images partition of our isobuilders doesn't get filled because of some weird failures.

So I'd rather merge a branch that does only that, so unless there's a good reason I've missed, let's please create a new branch with only the needed commits (good time to rebase -i and clean up history a bit BTW?). And if the changes done in clean_up_builder_vms are useful to solve another problem, please create another ticket about it, pointing to feature/12599. What do you think about this plan?

Not sure. As said above I think both code have their reasons, so I think we can keep both of them, or maybe merge the volume removal in the clean_up_builder_vms function (closer to how I implemented it at first in d27bb28a2474748123b5c61158657e779ba229bd).

I'll assign to anonym, as he has written a good deal of this code and he may have relevant input about this discussion.

#33 Updated by intrigeri over 2 years ago

  • Assignee changed from anonym to bertagaz
  • QA Check changed from Info Needed to Dev Needed

The new rake task is just there to ensure that no volumes are left overs and the Libvirt images partition of our isobuilders doesn't get filled because of some weird failures.

Which is exactly what this ticket is about, right? :)

Summing up what I replied bertagaz elsewhere:

  • I like the current semantics of forcecleanup, that we've had for a while and got used to.
  • I like the current semantics of TAILS_BUILD_OPTIONS, i.e. it (mostly?) affects the current build, rather than all future builds of any other branch. I don't think TAILS_BUILD_OPTIONS is the best place to trigger that broad and potentially destructive changes.
  • If we need a way to properly remove all build VMs (including unmounting, destroying and undefining them before deleting storage volumes), which can be achieved thanks to the changes this topic branch bring in clean_up_builder_vms: fine by me. Let's file another ticket about it, discuss there how it should be called, refactor/merge the code with the one added to clean_up_libvirt_volumes. This is definitely not High priority IMO though: the problem this ticket is about has been hurting us too much for too long, so I really would rather not block on lower-priority improvements to get it fixed.

Fair enough?

#34 Updated by bertagaz over 2 years ago

  • Assignee changed from bertagaz to intrigeri
  • % Done changed from 10 to 50
  • QA Check changed from Dev Needed to Ready for QA

intrigeri wrote:

The new rake task is just there to ensure that no volumes are left overs and the Libvirt images partition of our isobuilders doesn't get filled because of some weird failures.

Which is exactly what this ticket is about, right? :)

Summing up what I replied bertagaz elsewhere:

  • I like the current semantics of forcecleanup, that we've had for a while and got used to.
  • I like the current semantics of TAILS_BUILD_OPTIONS, i.e. it (mostly?) affects the current build, rather than all future builds of any other branch. I don't think TAILS_BUILD_OPTIONS is the best place to trigger that broad and potentially destructive changes.
  • If we need a way to properly remove all build VMs (including unmounting, destroying and undefining them before deleting storage volumes), which can be achieved thanks to the changes this topic branch bring in clean_up_builder_vms: fine by me. Let's file another ticket about it, discuss there how it should be called, refactor/merge the code with the one added to clean_up_libvirt_volumes. This is definitely not High priority IMO though: the problem this ticket is about has been hurting us too much for too long, so I really would rather not block on lower-priority improvements to get it fixed.

Fair enough?

Pushed a new branch (bugfix/12599-rake-task-removing-tails-builder-volumes) containing only the rake task, in the Tails repo as well as the jjb repo.

#35 Updated by intrigeri over 2 years ago

  • Feature Branch changed from feature/12599 to bugfix/12599-rake-task-removing-tails-builder-volumes,jenkins-jobs:bugfix/12599-rake-task-removing-tails-builder-volumes

#36 Updated by intrigeri over 2 years ago

Code review passes. I'll deploy that and we'll see what happens. If we're unlucky and it breaks stuff I'll revert quickly so we don't have to fix the branch(es) under stress :)

#37 Updated by bertagaz over 2 years ago

  • Blocks Bug #12637: Deploy rake libvirt volumes clean up task on all Jenkins build jobs added

#38 Updated by intrigeri over 2 years ago

  • Status changed from In Progress to Fix committed
  • % Done changed from 50 to 100

#39 Updated by intrigeri over 2 years ago

  • Assignee deleted (intrigeri)
  • QA Check changed from Ready for QA to Pass

Deployed and merged into testing + devel. Optimistically closing, even though only time will tell if this works well in all cases. Congrats!

#40 Updated by intrigeri over 2 years ago

Renamed feature/12599 to wip/feature/12599.

#41 Updated by intrigeri over 2 years ago

  • Status changed from Fix committed to Resolved

#42 Updated by intrigeri over 2 years ago

  • Related to Bug #13302: /var/lib/libvirt/images sometimes gets filled on isobuilders, take 2 added

#43 Updated by intrigeri over 2 years ago

  • Priority changed from High to Elevated

Also available in: Atom PDF