Project

General

Profile

Bug #12725

Feature #5630: Reproducible builds

Sort out the apt-snapshots-disk partition situation on apt.lizard

Added by bertagaz over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
Infrastructure
Target version:
Start date:
06/16/2017
Due date:
% Done:

100%

Feature Branch:
Type of work:
Sysadmin
Blueprint:
Starter:
Affected tool:

Description

Due to the implementation of Vagrant builds in Jenkins being deployed before #12002, we could not grow this partition as much as planned/needed. On top of that, we had to freeze a bit more APT snapshots than usual for #5630. So our monitoring is rightfully complaining that the disk space left on the apt-snapshosts-disk partition of apt.lizard is critical.


Related issues

Related to Tails - Bug #12829: FTBFS due to buggy APT sources set in Vagrant build box by the provision script Resolved 06/21/2017
Related to Tails - Feature #12002: Estimate hardware cost of reproducible builds in Jenkins Resolved 11/28/2016
Related to Tails - Bug #13526: apt-snapshots partition lacks disk space Resolved 07/27/2017

History

#1 Updated by intrigeri over 2 years ago

Given this operation is destructive and hard to revert (we don't backup time-based snapshots), I strongly suggest you give me a list of snapshots you want to force garbage collection for (and how you built the list) so I can review it before you delete stuff. Thanks!

#2 Updated by bertagaz over 2 years ago

  • Assignee changed from bertagaz to intrigeri
  • % Done changed from 0 to 20
  • QA Check set to Info Needed

intrigeri wrote:

Given this operation is destructive and hard to revert (we don't backup time-based snapshots), I strongly suggest you give me a list of snapshots you want to force garbage collection for (and how you built the list) so I can review it before you delete stuff. Thanks!

True. So here's what I found after searching for APT snapshots which Valid-Until is not in June:

  • 2017040603, Valid-Until set to Sat, 15 Jul 2017
  • 2017042704, Valid-Until set to Sat, 28 Oct 2017

The former is the 2.12 snapshot and will disappear soon. The later is one we've bumped while working on the vagrant builds. It's not used anymore AFAIK, since the basebox snapshot serials have been bumped with the 3.0 release so it seems a good candidate to garbage collection.

#3 Updated by intrigeri over 2 years ago

  • Assignee changed from intrigeri to bertagaz
  • QA Check changed from Info Needed to Dev Needed
  • 2017040603, Valid-Until set to Sat, 15 Jul 2017
  • 2017042704, Valid-Until set to Sat, 28 Oct 2017

The former is the 2.12 snapshot and will disappear soon.

OK, so this one is fully expected, and I don't see any reason to remove it manually.
(If we need to remove it manually, fine, but then something's seriously wrong and the root cause should be tracked elsewhere.)

The later is one we've bumped while working on the vagrant builds. It's not used anymore AFAIK, since the basebox snapshot serials have been bumped with the 3.0 release so it seems a good candidate to garbage collection.

OK, let's force early expiration for this one then.

#4 Updated by bertagaz over 2 years ago

  • Status changed from Confirmed to In Progress
  • % Done changed from 20 to 50

intrigeri wrote:

  • 2017040603, Valid-Until set to Sat, 15 Jul 2017
  • 2017042704, Valid-Until set to Sat, 28 Oct 2017

The former is the 2.12 snapshot and will disappear soon.

OK, so this one is fully expected, and I don't see any reason to remove it manually.
(If we need to remove it manually, fine, but then something's seriously wrong and the root cause should be tracked elsewhere.)

Yes.

The later is one we've bumped while working on the vagrant builds. It's not used anymore AFAIK, since the basebox snapshot serials have been bumped with the 3.0 release so it seems a good candidate to garbage collection.

OK, let's force early expiration for this one then.

Done, will be garbage collected tomorrow, let see if it fixes the situation.

#5 Updated by intrigeri over 2 years ago

  • Subject changed from Clean up apt-snapshots-disk partition on apt.lizard to Sort out the apt-snapshots-disk partition situation on apt.lizard
  • Priority changed from Normal to High

bertagaz wrote:

Done, will be garbage collected tomorrow, let see if it fixes the situation.

Time runs and today our snapshots system can't do its job anymore:

NOT ENOUGH FREE SPACE on filesystem 0xfe20 (the filesystem '/srv/apt-snapshots/time-based/repositories/debian/db' is on)
available blocks 832090, needed blocks 1084079, block size is 4096.
"/usr/bin/reprepro" unexpectedly returned exit value 255 at /usr/local/bin/tails-update-time-based-apt-snapshots line 40.

#6 Updated by intrigeri over 2 years ago

  • Related to Bug #12829: FTBFS due to buggy APT sources set in Vagrant build box by the provision script added

#7 Updated by intrigeri over 2 years ago

We're far from having allocated everything planned for this partition, so I'm growing it by 10GB in the hope that's enough to unbreak this temporarily. But a real solution is needed, and we don't have enough space available to grow this partition as much as we planned, due to the Vagrant thing having been deployed and #12002 not being done yet.

#8 Updated by intrigeri over 2 years ago

  • Related to Feature #12002: Estimate hardware cost of reproducible builds in Jenkins added

#9 Updated by intrigeri over 2 years ago

  • Description updated (diff)

#10 Updated by intrigeri over 2 years ago

intrigeri wrote:

  • 2017040603, Valid-Until set to Sat, 15 Jul 2017

The former is the 2.12 snapshot and will disappear soon.

OK, so this one is fully expected, and I don't see any reason to remove it manually.
(If we need to remove it manually, fine, but then something's seriously wrong and the root cause should be tracked elsewhere.)

Now that the root cause is clear, please do force early expiration of this one too, as yet another temporary mitigation of the problematic situation we're in until the root cause is solved.

#11 Updated by bertagaz over 2 years ago

intrigeri wrote:

Now that the root cause is clear, please do force early expiration of this one too, as yet another temporary mitigation of the problematic situation we're in until the root cause is solved.

Ack, done. 2017040603 will expire tomorrow, we'll see how much disk space it frees.

#12 Updated by bertagaz over 2 years ago

bertagaz wrote:

Ack, done. 2017040603 will expire tomorrow, we'll see how much disk space it frees.

This 2.12 snapshot has freed around 38G, we're back to green here for now with 54G left.

#13 Updated by intrigeri over 2 years ago

  • Status changed from In Progress to Resolved

This 2.12 snapshot has freed around 38G, we're back to green here for now with 54G left.

Great! Closing then: this ticket was about the short-term emergency situation we were in, and the long-term fix is tracked by #11806 (which is itself blocked by #12002).

#14 Updated by intrigeri over 2 years ago

  • Assignee deleted (bertagaz)
  • % Done changed from 50 to 100
  • QA Check changed from Dev Needed to Pass

#15 Updated by intrigeri over 2 years ago

  • Related to Bug #13526: apt-snapshots partition lacks disk space added

Also available in: Atom PDF