Bug #15695
Avoid breaking automatic upgrades to Tails 3.9
100%
Description
As noticed thanks to #15419, our devel branch changes the GID of the debian-tor
user:
-vboxsf:x:112:
-monkeysphere:x:113:
-debian-tor:x:114:tor-launcher
-lpadmin:x:115:
+monkeysphere:x:112:
+debian-tor:x:113:tor-launcher
+lpadmin:x:114:
+vboxsf:x:115:
Related issues
Associated revisions
Factor-out fixing of tails-persistent-setup UID/GID settings into utility methods. (refs: #15695)
v2 by Cyril Brulebois:
- Use the correct command to adjust group ID.
Signed-off-by: Cyril Brulebois <cyril@debamax.com>
Prevent change of gid of debian-tor user breaking automatic upgrades. (Closes: #15695)
As outlined in a774ba2aba7012f3f5bc630b0f3dab8ac696dbd7, since the
virtualbox 5.2.8-dfsg-7 upload the virtualbox-guest-utils package depends
on virtualbox-guest-dkms which in-turn creates a "vboxsf" group later than
before.
This results in the resulting debian-tor group ID being assigned a
different number due to change of relative ordering and, accordingly,
breaks automatic upgrades.
We therefore assign a number of groups, fresh, unreserved group IDs to
avoid collisions.
Ensure GIDs are stable across releases (refs: #15695).
Fixup on top of 3682fde35080381eb97d649966b584fc9c777998: we should not
"assign a number of groups, fresh, unreserved group IDs" because then these
groups would have a different GID than in Tails 3.8, which is precisely
what we're trying to avoid here :)
Acknowledge expected innocuous change to /etc/group (refs: #15695)
Only order changes, which does not matter.
History
#1 Updated by intrigeri over 1 year ago
As explained in a774ba2aba7012f3f5bc630b0f3dab8ac696dbd7, "Since the virtualbox 5.2.8-dfsg-7 upload, the virtualbox-guest-utils package depends on virtualbox-guest-dkms". I think this new dependency explains why virtualbox-guest-utils
, that creates the vboxsf
group, is installed later than before.
#2 Updated by intrigeri over 1 year ago
- Related to Bug #15419: Detect earlier in the dev process if we're breaking automatic upgrades added
#3 Updated by intrigeri over 1 year ago
- Blocks Feature #15334: Core work 2018Q3: Foundations Team added
#4 Updated by intrigeri over 1 year ago
- Status changed from Confirmed to In Progress
- % Done changed from 0 to 10
Brainstorming our options:
- ship a patched
virtualbox-guest-utils
package that removes the added dependency and hope that's enough to fix this instance of the more general problem (while we're at it, we can as well backport the latest version from sid, that adds compatibility with Linux 4.17; too bad, virtualbox is now in stretch-backports which could have given us the opportunity to stop shipping it via our custom APT repo); that's relatively cheap and easy, and surely an acceptable way to postpone the problem, hoping we have more time to address its root cause next time another instance of this problem pops up. - go back to the hacks from #15424: change UIDs/GIDs via a chroot hook, killing processes as needed; service startup is disabled in the chroot, so in theory only processes started via maintainer scripts and left behind should have to be killed; in practice, that probably boils down to
gpg-agent
; so while my concern about killing processes outside of the chroot remains, limiting the kill spree togpg-agent
processes might be safe enough; in our current Vagrant VM, I see no user that has obvious reasons to havegpg-agent
running, and indeed I see no such process running during a build. That's also relatively cheap and easy, and most of the work has been done already. It'l be wasted work at some point if overlayfs is not affected by the bug, but well, at least we're covered until #8415 is done. - find some way to stabilize all UIDs/GIDs (#15407): looks like more work than the two first options, and that work will be wasted if overlayfs is not affected by the bug, which makes this option very unappealing to me.
- complete #8415 (overlayfs) in time for Tails 3.9… assuming overlayfs is not affected by the bug that makes us worry about changing UIDs/GIDs (to be verified on #15689): there's not that much work left to do and we have to do it soon anyway, so clearly it would be the best place to invest our time in; but it's not clear whether it's realistic to do that in the next 1.5 months
I'm currently leaning towards the 2nd option (stabilize just the UIDs/GIDs we want), with the 1st one as a fallback. But depending on how our first shots at #15091 and #15023 go, and on how we distribute the work on these two big tasks, who knows, #8415 might be doable. I'll come back here in a few weeks and depending on our progress on these other fronts, I'll pick the best approach among the options that will be realistic at this point.
#5 Updated by intrigeri over 1 year ago
- Related to Bug #15407: Prevent system user uid:s and gid:s from changing between releases added
#6 Updated by intrigeri over 1 year ago
- Related to Bug #15424: Use fixed UID and GID for debian-tor added
#7 Updated by intrigeri over 1 year ago
- Related to deleted (Bug #15419: Detect earlier in the dev process if we're breaking automatic upgrades)
#8 Updated by intrigeri over 1 year ago
- Blocks Bug #15419: Detect earlier in the dev process if we're breaking automatic upgrades added
#9 Updated by intrigeri over 1 year ago
- Related to Bug #13426: Tor does not start on Tails 3.0.1 automatically upgraded from 3.0 added
#10 Updated by intrigeri over 1 year ago
intrigeri wrote:
I'm currently leaning towards the 2nd option (stabilize just the UIDs/GIDs we want), with the 1st one as a fallback. But depending on how our first shots at #15091 and #15023 go, and on how we distribute the work on these two big tasks, who knows, #8415 might be doable.
If I magically have time to do #8415 in time for the freeze, great, but let's not count on it => I think we should go for the 2nd option, aiming to have something merged by August 10. I'll try to find a FT member who can handle this.
#11 Updated by intrigeri over 1 year ago
lamby will take a look and tell me by Aug 11 whether he wants it (tentative budget: 2h).
#12 Updated by intrigeri over 1 year ago
- Assignee changed from intrigeri to lamby
#13 Updated by intrigeri over 1 year ago
intrigeri wrote:
lamby will take a look and tell me by Aug 11 whether he wants it (tentative budget: 2h).
Ping? The freeze is on the 15th so if you don't want it, I'll work on it tomorrow or Tuesday; and if you want it, great :)
#14 Updated by lamby over 1 year ago
- Assignee changed from lamby to intrigeri
Thanks for the ping. I've spent about 1h looking at this issue and trying to understand it.
In my attempt to properly understand this issue, I dived into the code, resulting in the following entirely-untested commits: https://github.com/lamby/tails/commits/15695-avoid-breaking-automatic-upgrades-to-tails-3-9. These might be worth merging anyway as it's essentially "better commenting"
However, I got somewhat stuck on:
change UIDs/GIDs via a chroot hook, killing processes as needed; service startup is disabled in the chroot, so in theory only processes started via maintainer scripts and left behind should have to be killed; in practice, that probably boils down to gpg-agent
Three questions arise for me here: why would anything be running at all? Why would that be gpg-agent and not, well, something running as debian-tor? Why do we not do (or need?) the kill-process dance for the tails-persistent-setup chroot hack right now?
(Some quick personal notes for this bug: See #13426 for the background on why this breaks upgrades, the root cause being probably in aufs which will eventually be replaced with overlayfs in #8415.)
#15 Updated by intrigeri over 1 year ago
- Assignee changed from intrigeri to lamby
In my attempt to properly understand this issue, I dived into the code, resulting in the following entirely-untested commits: https://github.com/lamby/tails/commits/15695-avoid-breaking-automatic-upgrades-to-tails-3-9. These might be worth merging anyway as it's essentially "better commenting"
Good idea! Two potential issues that prevent me from merging it though:
- In
Change_gid
I think you meant-exec chgrp
instead of-exec chown
. Change_gid 112 150
feels strange because we hadTPS_GROUP_STEALER=$(getent group 122 | awk -F ':' '{print $1}')
(note 112 vs. 122)
By the way, basing your work on top of my branch for #15419 will help you validate it :) I've updated bugfix/15419-detect-uid-and-gid-changes
so our CI builds it and I'll share the FTBFS message with you as soon as a build has failed, which will demonstrate better what the practical problem we need to solve now is.
However, I got somewhat stuck on:
change UIDs/GIDs via a chroot hook, killing processes as needed; service startup is disabled in the chroot, so in theory only processes started via maintainer scripts and left behind should have to be killed; in practice, that probably boils down to gpg-agent
Three questions arise for me here: why would anything be running at all?
Why would that be gpg-agent and not, well, something running as debian-tor?
On #15424 we noticed that the monkeysphere user had a running process. I've assumed this was gpg-agent
because on our ISO build CI jobs we have to kill those on exit in order to be able to tear down a mountpoint. IIRC that's because the monkeysphere postinst performs GnuPG operations, that trigger gpg-agent
startup (in the background, that's how GnuPG 2 works). The tor package does no such thing which is why there's no running process left behind running as the debian-tor
user. Now, my hunch might be wrong (I don't recall whether I reality-checked it) but we'll know as soon as the workaround this ticket is now about is implemented :)
Why do we not do (or need?) the kill-process dance for the tails-persistent-setup chroot hack right now?
Do you mean: why don't we do it yet (in current devel
branch)? Or why shouldn't we do it for 3.9 as part of this ticket? I guess because we've never needed it so far :)
#16 Updated by lamby over 1 year ago
- Assignee changed from lamby to intrigeri
In Change_gid I think you meant -exec chgrp instead of -exec chown.
Fixed.
Change_gid 112 150 feels strange because we had TPS_GROUP_STEALER=$(getent group 122 | awk -F ':' '{print $1}') (note 112 vs. 122)
Fixed. That'll teach me to work in the sun...
By the way, basing your work on top of my branch for #15419 will help you validate it
Rebased.
#17 Updated by intrigeri over 1 year ago
- Assignee changed from intrigeri to lamby
I'll review the (side-)changes you've done already (and will likely merge them!) but let's keep this assigned to you for tracking the main problem this ticket is about.
#18 Updated by intrigeri over 1 year ago
I've merged your changes into bugfix/15419-detect-uid-and-gid-changes
, now building.
And here's the current problem we're trying to "solve" here:
Checking UIDs and GIDs stability /usr/share/tails/build/passwd /etc/passwd differ: char 1310, line 25 /etc/passwd and/or /etc/group differs from expected: --- /usr/share/tails/build/passwd 2018-06-09 15:22:28.000000000 +0000 +++ /etc/passwd 2018-08-13 09:48:10.437902256 +0000 @@ -22,8 +22,8 @@ _apt:x:104:65534::/nonexistent:/bin/false messagebus:x:103:105::/var/run/dbus:/bin/false memlockd:x:105:110:memlockd system account,,,:/usr/lib/memlockd:/bin/false -monkeysphere:x:106:113:monkeysphere authentication user,,,:/var/lib/monkeysphere:/bin/bash -debian-tor:x:107:114::/var/lib/tor:/bin/false +monkeysphere:x:106:112:monkeysphere authentication user,,,:/var/lib/monkeysphere:/bin/bash +debian-tor:x:107:113::/var/lib/tor:/bin/false speech-dispatcher:x:108:29:Speech Dispatcher,,,:/var/run/speech-dispatcher:/bin/false colord:x:109:117:colord colour management daemon,,,:/var/lib/colord:/bin/false saned:x:110:118::/var/lib/saned:/bin/false Content of '/etc/passwd': root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin bin:x:2:2:bin:/bin:/usr/sbin/nologin sys:x:3:3:sys:/dev:/usr/sbin/nologin sync:x:4:65534:sync:/bin:/bin/sync games:x:5:60:games:/usr/games:/usr/sbin/nologin man:x:6:12:man:/var/cache/man:/usr/sbin/nologin lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin mail:x:8:8:mail:/var/mail:/usr/sbin/nologin news:x:9:9:news:/var/spool/news:/usr/sbin/nologin uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin proxy:x:13:13:proxy:/bin:/usr/sbin/nologin www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin backup:x:34:34:backup:/var/backups:/usr/sbin/nologin list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin irc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin systemd-timesync:x:100:102:systemd Time Synchronization,,,:/run/systemd:/bin/false systemd-network:x:101:103:systemd Network Management,,,:/run/systemd/netif:/bin/false systemd-resolve:x:102:104:systemd Resolver,,,:/run/systemd/resolve:/bin/false _apt:x:104:65534::/nonexistent:/bin/false messagebus:x:103:105::/var/run/dbus:/bin/false memlockd:x:105:110:memlockd system account,,,:/usr/lib/memlockd:/bin/false monkeysphere:x:106:112:monkeysphere authentication user,,,:/var/lib/monkeysphere:/bin/bash debian-tor:x:107:113::/var/lib/tor:/bin/false speech-dispatcher:x:108:29:Speech Dispatcher,,,:/var/run/speech-dispatcher:/bin/false colord:x:109:117:colord colour management daemon,,,:/var/lib/colord:/bin/false saned:x:110:118::/var/lib/saned:/bin/false pulse:x:111:119:PulseAudio daemon,,,:/var/run/pulse:/bin/false hplip:x:112:7:HPLIP system user,,,:/var/run/hplip:/bin/false Debian-gdm:x:113:121:Gnome Display Manager:/var/lib/gdm3:/bin/false tails-persistence-setup:x:115:122::/home/tails-persistence-setup:/bin/false clearnet:x:114:123::/home/clearnet:/bin/false htp:x:116:124::/home/htp:/bin/false tails-iuk-get-target-file:x:117:125::/home/tails-iuk-get-target-file:/bin/false tails-upgrade-frontend:x:118:126::/home/tails-upgrade-frontend:/bin/false tor-launcher:x:119:127::/home/tor-launcher:/bin/false tails-install-iuk:x:120:128::/home/tails-install-iuk:/bin/false --- /usr/share/tails/build/group 2018-06-09 15:22:28.000000000 +0000 +++ /etc/group 2018-08-13 09:48:10.505901231 +0000 @@ -48,10 +48,10 @@ ssh:x:109: memlockd:x:110: ssl-cert:x:111: -vboxsf:x:112: -monkeysphere:x:113: -debian-tor:x:114:tor-launcher -lpadmin:x:115: +monkeysphere:x:112: +debian-tor:x:113:tor-launcher +lpadmin:x:114: +vboxsf:x:115: scanner:x:116:saned colord:x:117: saned:x:118: Content of '/etc/group': root:x:0: daemon:x:1: bin:x:2: sys:x:3: adm:x:4: tty:x:5: disk:x:6: lp:x:7: mail:x:8: news:x:9: uucp:x:10: man:x:12: proxy:x:13: kmem:x:15: dialout:x:20: fax:x:21: voice:x:22: cdrom:x:24: floppy:x:25: tape:x:26: sudo:x:27: audio:x:29:pulse dip:x:30: www-data:x:33: backup:x:34: operator:x:37: list:x:38: irc:x:39: src:x:40: gnats:x:41: shadow:x:42: utmp:x:43: video:x:44: sasl:x:45: plugdev:x:46: staff:x:50: games:x:60: users:x:100: nogroup:x:65534: systemd-journal:x:101: systemd-timesync:x:102: systemd-network:x:103: systemd-resolve:x:104: input:x:106: crontab:x:107: netdev:x:108: messagebus:x:105: ssh:x:109: memlockd:x:110: ssl-cert:x:111: monkeysphere:x:112: debian-tor:x:113:tor-launcher lpadmin:x:114: vboxsf:x:115: scanner:x:116:saned colord:x:117: saned:x:118: pulse:x:119: pulse-access:x:120: Debian-gdm:x:121: tails-persistence-setup:x:122: clearnet:x:123: htp:x:124: tails-iuk-get-target-file:x:125:tails-install-iuk tails-upgrade-frontend:x:126: tor-launcher:x:127: tails-install-iuk:x:128: If these changes are innocuous, update these files in config/chroot_local-includes/usr/share/tails/build/. See #15407 and #13426 for more context. E: config/chroot_local-hooks/99-zzz_check_uids_and_gids failed (exit non-zero). You should check for errors.
#19 Updated by lamby over 1 year ago
- Assignee changed from lamby to intrigeri
let's keep this assigned to you for tracking the main problem this ticket is about.
Thanks, good idea — as mentioned, I only really wrote and sent over these patches so I ensured I was understanding the issue, kinda like how making your own notes in a lecture is not the same as just reading someone else's! :) Do appreciate they are somewhat distracting however.
Here's the current problem we're trying to "solve" here:
Thanks for sending that over.
So, my reading of that diff of that we need to move the uid of vboxdrv
from 112 → 151 and its gid from 112 → 151. That your quick assessment too? :)
#20 Updated by intrigeri over 1 year ago
- Assignee changed from intrigeri to lamby
So, my reading of that diff of that we need to move the uid of
vboxdrv
from 112 → 151 and its gid from 112 → 151. That your quick assessment too? :)
I don't see vboxdrv
involved anywhere in the error message I've pasted.
My quick assessment is that UIDs are all right (unchanged in the diff), only GIDs are problematic. Wrt. GIDs, 4 groups are involved and they're fighting for a stable set of 4 GIDs (112-115). What we need to do is to restore the state from Tails 3.8 (captured in /usr/share/tails/build/group
) i.e.:
- vboxsf: GID 115 → 112
- monkeysphere: GID 112 → 113
- debian-tor: GID 113 → 114
- lpadmin: GID 114 → 115
The "fun" part is that I think we can't do this in place with atomic groupmod
operations. So likely we need to temporarily give one of these groups another, free GID that's not in the 112-115 range, and then change the GID for the 3 other groups in the right order, and finally give its correct GID to the group that was temporarily moved out of the way.
Needless to say, all this is ugly and I don't expect it to get any better in the future… until we fix the root cause of the problem or devise a nicer solution to set all these IDs in stone.
#21 Updated by lamby over 1 year ago
- File 0001-Prevent-change-of-gid-of-debian-tor-user-breaking-au.patch View added
- Assignee changed from lamby to intrigeri
I don't see vboxdrv involved anywhere in the error message I've pasted.
Ah, copy-paste fail. vboxsf
.. :)
[…]
How about the attached? :)
#22 Updated by intrigeri over 1 year ago
- Assignee changed from intrigeri to lamby
How about the attached? :)
I think it will work but:
- Can you confirm you tested it and the new check for changed UIDs/GIDs does not abort the build?
- The first part of the patch hard-codes the current (as in, on our devel branch today) GID allocated to these 4 groups. As the mere existence of this very ticket shows, it's a risky assumption. So next time these GIDs start dancing around, IMO this patch will make it harder to fix the problem than it could, because if we apply it as-is we may be changing the GID of unrelated groups in the future. So I'd rather see us do things like
Change_gid $(gid_of monkeysphere) 1120
. Makes sense?
#23 Updated by lamby over 1 year ago
- File 0002-Prevent-change-of-gid-of-debian-tor-user-breaking-au.patch View added
- File 0001-Correct-call-to-usermod-to-pass-gid-not-uid.-refs-15.patch View added
- Assignee changed from lamby to intrigeri
I'd rather see us do thingws like Change_gid $(gid_of monkeysphere) 1120
Oh, good idea. This is not only a more stable solution, it is also much cleaner in the code (self-documenting!). Please see the attached 2 patches which I have also pushed to https://github.com/lamby/tails/commits/15695-avoid-breaking-automatic-upgrades-to-tails-3-9.
Can you confirm you tested it and the new check for changed UIDs/GIDs does not abort the build?
Unfortunately I have not tested it (except in a local chroot actually, but that doesn't count). Unfortunately, I am semi-VAC today and tomorrow (first time off since 12th July...) so I won't be able to commit to testing this before the 15th unless it happens to rain this afternoon (!) which would mean that I would not need to be entertaining my dear mother :)
#24 Updated by intrigeri over 1 year ago
I'd rather see us do thingws like Change_gid $(gid_of monkeysphere) 1120
Oh, good idea. This is not only a more stable solution, it is also much cleaner in the code (self-documenting!). Please see the attached 2 patches which I have also pushed to https://github.com/lamby/tails/commits/15695-avoid-breaking-automatic-upgrades-to-tails-3-9.
I like it! I'll take it over from here. I had a quick look and the only major problem I've noticed is that you're giving these groups different GIDs than in Tails 3.8, which is precisely what we want to avoid. But the machinery is in place on your branch so I should be able to easily use it to fix the problem we're tackling here :)
[…] unless it happens to rain this afternoon (!) which would mean that I would not need to be entertaining my dear mother :)
Indeed, rain is entertaining enough in itself :)
#25 Updated by intrigeri over 1 year ago
- Assignee changed from intrigeri to segfault
- % Done changed from 10 to 50
- Estimated time set to 0.50 h
- QA Check set to Ready for QA
- Feature Branch set to bugfix/15695-avoid-breaking-automatic-upgrades-to-tails-3-9
This branch includes the check brought for #15419 so if it builds (and it did build at least until I've merged current devel into it), that means it does its job :) And indeed, until my last fixes that check did abort the build, which is confidence-inspiring.
#26 Updated by intrigeri over 1 year ago
- Assignee changed from segfault to CyrilBrulebois
#27 Updated by lamby over 1 year ago
Damn, I just came back to this issue after I've been (somewhat failing) to build this on my local machine for the past hour due to reasons unrelated to the changes. I should build Tails more often, basically.
you're giving these groups different GIDs than in Tails 3.8
What's the problem there?
#28 Updated by intrigeri over 1 year ago
you're giving these groups different GIDs than in Tails 3.8
What's the problem there?
This would trigger another instance of the problem we had with the upgrade to 3.0.1 (#13426) and again with the upgrade to 3.6 (which made us create #15407 and #15419, which in turn made me notice we would have the same problem again for 3.9, which finally made me file this very ticket): due to what looks like an aufs bug, on devices upgraded via our incremental (automatic) update mechanism, the tor daemon won't be allowed to write to its /var/lib/tor
because the GID of the group owning that directory changed between the different levels of the SquashFS stack we build with aufs.
#29 Updated by intrigeri over 1 year ago
- Status changed from In Progress to 11
- Assignee deleted (
CyrilBrulebois) - % Done changed from 50 to 100
- QA Check changed from Ready for QA to Pass
As per #15407#note-31.
#30 Updated by lamby over 1 year ago
- Status changed from 11 to In Progress
Applied in changeset 1d3d7854b310ae77761fbef22af59458450801ba.
#31 Updated by lamby over 1 year ago
- Status changed from In Progress to Resolved
Applied in changeset 3682fde35080381eb97d649966b584fc9c777998.
#32 Updated by intrigeri over 1 year ago
- Status changed from Resolved to 11
#33 Updated by intrigeri over 1 year ago
- Status changed from 11 to Resolved