Project

General

Profile

Bug #16020

acngtool shrink is insufficient to maintain acng cache size

Added by anonym over 1 year ago. Updated 10 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Build system
Target version:
Start date:
10/02/2018
Due date:
% Done:

100%

Feature Branch:
bugfix/16020-fix-cache-shrinking
Type of work:
Code
Blueprint:
Starter:
Affected tool:

Description

In build-tails we call acngtool shrink 10G before each build to prevent the cache from running out of disk space. From what I can tell it doesn't clean APT indices correctly, e.g. in my /var/cache/apt-cacher-ng/time-based.snapshots.deb.tails.boum.org/debian I have snapshots dating back to January 2017. Each such snapshot takes 30-120 MB (especially the old ones with multiarch are large) so it adds up, for me to 8 GBs. :S

Either we need to improve acngtool (for everyone's benefit) or we manually find snapshots older than six months (or whatever) and purge them from acng's cache.


Related issues

Related to Tails - Bug #17288: ENOSPC during build, while upgrading the Vagrant box Resolved
Blocks Tails - Feature #16209: Core work: Foundations Team Confirmed

Associated revisions

Revision f68a97af (diff)
Added by CyrilBrulebois 11 months ago

Fix apt-cacher-ng cache shrinking (refs: #16020).

The “acngtool shrink” command never fails (at least with apt-cacher-ng
2-2, in stretch), which might explain why the missing sudo call was
missed for so long: the insufficient rights as an unprivileged user
don't lead to an non-zero exit code, and to the printing of a warning
message.

Revision 3269cb58
Added by anonym 11 months ago

Merge remote-tracking branch 'origin/bugfix/16020-fix-cache-shrinking' into stable

Fix-committed: #16020

History

#1 Updated by bertagaz over 1 year ago

I've noticed that too locally, acng is quite sloppy at shrinking to the maximum amount it is given. It always do to a somewhat higher value. I "solved" that by just lowering the number to get closer to my needs.

#2 Updated by anonym over 1 year ago

That didn't work for me -- no matter the size argument I gave nothing was removed. It is as if a too high ratio of non-debs (APT indices, TBB tarballs) makes its calculations flip out and nothing is freed.

#3 Updated by anonym over 1 year ago

intri suggested that acng's daily cronjob should be able to clean it up, but that it takes a long time: "(it fetches all dists again to identify obsolete packages) so I doubt we can do that at every build".

#4 Updated by segfault over 1 year ago

I had the same issue (see #16032)

#5 Updated by anonym over 1 year ago

anonym wrote:

intri suggested that acng's daily cronjob should be able to clean it up, but that it takes a long time: "(it fetches all dists again to identify obsolete packages) so I doubt we can do that at every build".

Or maybe not:

(17:25:53) intrigeri: ah ah on lizard, we do
"rm -rf /var/cache/apt-cacher-ng/*.tails.boum.org" 
weekly, so no, the cronjob is not what saves us there.

#6 Updated by intrigeri 12 months ago

#7 Updated by intrigeri 12 months ago

  • Assignee deleted (anonym)

After segfault & anonym, @CyrilBrulebois was affected by this problem yesterday ⇒ added to the FT's radar.

No progress since a while here ⇒ deassigning anonym for now, let's make it clear that this ticket is up for grabs and could be tackled by whoever else has time for it :)

#8 Updated by CyrilBrulebois 12 months ago

  • Assignee set to CyrilBrulebois

Hit this yesterday or the day before, it's next on my list.

#9 Updated by CyrilBrulebois 11 months ago

Let's look at our code calling acngtool shrink (vagrant/provision/assets/build-tails):

if [ "${TAILS_PROXY_TYPE}" = "vmproxy" ]; then
    # The apt-cacher-ng cache disk is 15G, so let's ensure at most 10G
    # of it is used there is 5G before each build, which should be
    # enough for any build, even if we have to download a complete set
    # of new packages for a new Debian release.
    /usr/lib/apt-cacher-ng/acngtool shrink 10G -f || \
        echo "The clean-up of apt-cacher-ng's cache failed: this is" \
             "not fatal and most likely just means that some disk" \
             "space could not be reclaimed -- in order to fix that" \
             "situation you need to manually investigate " \
             "/var/cache/apt-cacher-ng/apt-cacher-ng-log/main_*.html" >&2
fi

It seems pretty straightforward, catching issues and displaying an error message when that happens (while carrying on).

But upstream code (source/acngtool.cc in apt-cacher-ng 2-2) has:

        if(verbose)
                cout << "Found " << totalSize << " bytes of relevant data, reducing to " << wantedSize << endl;
        while(!delQ.empty())
        {
                bool todel = (totalSize > wantedSize);
                totalSize -= delQ.top().size;
                const char *msg = 0;
                if(verbose || dryrun)
                        msg = (todel ? "Delete: " : "Keep: " );
                auto& delpath(delQ.top().path);
                if(msg)
                        cout << msg << delpath << endl << msg << delpath << ".head" << endl;
                if(todel && apply)
                {
                        unlink(delpath.c_str());
                        unlink(mstring(delpath + ".head").c_str());
                }
                delQ.pop();
        }
        return 0;

Notice the utter lack of error handling and the mandatory return 0;? That's why we have been missing this for so long: that command never fails. In verbose mode, it seems there's much work going on, with many “Delete:” and a few “Keep:” entries. But the filesystem is left untouched.

Prepending the command with as_root_do fixes the shrinking…

#10 Updated by CyrilBrulebois 11 months ago

  • Status changed from Confirmed to In Progress

#11 Updated by CyrilBrulebois 11 months ago

  • Status changed from In Progress to Confirmed
  • Assignee deleted (CyrilBrulebois)
  • QA Check set to Ready for QA
  • Feature Branch set to bugfix/16020-fix-cache-shrinking

#12 Updated by intrigeri 11 months ago

  • Assignee set to segfault
  • Target version set to Tails_3.14

Thanks a lot, kibi, for your work here :)

Hi @segfault! Last time I checked you used the apt-cacher-ng maintained by our build system. Could you please review this branch?

#13 Updated by anonym 11 months ago

  • Status changed from Confirmed to 11
  • % Done changed from 0 to 100

#14 Updated by anonym 11 months ago

  • Assignee deleted (segfault)
  • % Done changed from 100 to 0
  • QA Check changed from Ready for QA to Pass

I tested it, and it worked perfectly for me! Woohoo!

I've merged into stabledevel but skipped feature/buster to not force us all to build a new basebox given our bandwidth limitations during the sprint.

#15 Updated by anonym 11 months ago

  • % Done changed from 0 to 100

#16 Updated by intrigeri 10 months ago

  • Target version changed from Tails_3.14 to Tails_3.13.2

#17 Updated by anonym 10 months ago

  • Status changed from 11 to Resolved

#18 Updated by anonym 10 months ago

  • Target version changed from Tails_3.13.2 to Tails_3.14

#19 Updated by intrigeri 10 months ago

  • Target version changed from Tails_3.14 to Tails_3.13.2

#20 Updated by intrigeri 3 months ago

  • Related to Bug #17288: ENOSPC during build, while upgrading the Vagrant box added

Also available in: Atom PDF