Memory erasure tests regression on the devel branch
The memory erasure tests have been failing on the devel branch for a few weeks.
- First failure: https://jenkins.tails.boum.org/job/test_Tails_ISO_devel/1442/ (Oct 15, 2018 11:07:16 PM; f1d545b1dc0d972076cfb7cba0e74622880da082), triggered by https://jenkins.tails.boum.org/job/build_Tails_ISO_devel/3183/
- Last success: https://jenkins.tails.boum.org/job/test_Tails_ISO_devel/1441/ (Oct 9, 2018 7:56:14 AM, d1abfbaf9e585fa9e68dda279208677c941b65e9), triggered by https://jenkins.tails.boum.org/job/build_Tails_ISO_devel/3173/
Significant changes between these commits:
- Linux upgraded from 4.17.0-3 to 4.18.0-2 and accordingly, aufs4-standalone upgraded from 01543e47eae7653c7e9a35a7204301f8a0b3ca50 to bdda97c749604bb9ea3f19e0c1ffac9042e79f77 → I doubt that matters because our stable branch also includes this change and there, the tests pass
debianAPT snapshot updated from 2018100901 to 2018101503 which includes at least systemd 237-3~bpo9+1 → 239-7~bpo9+1
Sadly, the build artifacts for these 2 ISO build jobs and most of the relevant APT snapshots have been GC'ed already. Still, comparing the
.build-manifest for the last successful builds of stable and devel and filtering out those that were upgraded in Debian after 20181015, the only potential culprit I see is the systemd 237-3~bpo9+1 → 239-7~bpo9+1 upgrade, which happened on 20181009 but strictly after the 2018100901
debian APT snapshot.
Fix memory erasure on shutdown with systemd v239 (refs: #16097).
Remounting /run with the "exec" option in /lib/systemd/system-shutdown/tails
does not work anymore with systemd v239, while it worked at least until systemd
v237. I could not find out why by reading systemd's NEWS file.
So let's instead do this there:
- For clean shutdown: in a new, dedicated service, started immediately before
final.target, which itself is a synchronization point that ensures this
service is started before the transition to systemd-shutdown and in turn to
the initramfs, where we finish the unmounting and other clean ups needed to
erase the memory.
- For emergency shutdown: in the udev watchdog script, before calling the
unclean shutdown code, which bypasses final.target and thus won't run
tails-remount-run-exec.service. Too bad we have to duplicate this mount
command but it seems that both instances will become unnecessary quickly
enough, once systemd DTRT™. Another way would be to manually start
tails-remount-run-exec.service from the udev watchdog script but I'm
concerned it will be unreliable when the boot medium has been unplugged.
Mount a dedicated tmpfs on /run/initramfs instead of trying to remount /run with the "exec" option (refs: #16097).
My previous approach, i.e. "let's remount /run with the exec option via a unit
file started as part of the shutdown procedure", worked just fine for clean
shutdown. But it does not work for emergency shutdown, i.e. when the boot medium
is physically removed: for some reason (possibly missing bits in the memlockd
configuration), this service is not started, and then systemd-shutdown won't
return to the initramfs because /run/initramfs/shutdown is not executable.
So let's instead disregard /run and extract the initramfs into a dedicated
tmpfs, that we mount on /run/initramfs (where systemd-shutdown will look for
it), and that we mount without the "noexec" option.
Also, remove manual calls to eject(1):
- They increase chances that the shutdown process breaks due to missing
files locked in memory by memlockd.
- Their sole benefit is to ensure we physically eject the DVD. It's unclear if
this code is still needed nowadays. Regardless, starting with Tails 3.12, the
only supported use case for ISO and DVD is virtual machines, which are not
targeted by the emergency shutdown feature, which is about removing the
physical boot medium.
Interestingly, most of "system memory erasure on shutdown" passes: only the one about the aufs RW branch fails.
https://github.com/systemd/systemd/issues/8221 might be relevant for emergency shutdown, although it's reported to happen with v237, which works fine for us.
- look at the videos from Jenkins, maybe they'll give some hints
- reproduce locally, perhaps with
nosplashand making systemd log more (and to the console)
#9 Updated by intrigeri about 1 month ago
- Status changed from Confirmed to In Progress
/lib/systemd/system-shutdown/tails is run but I see no trace of returning to the initramfs, which nicely explains the failure of the exact tests that rely on this mechanism (and "FindFailed: can not find MemoryWipeCompleted.png"), while the tests that verify that memory is erased on unmount and when processes are killed work just fine. Manually running
/bin/mount -o remount,exec /run before
halt fixes that; and doing this trick in the test suite fixes "Scenario: Erasure of the aufs read-write branch on shutdown". But
/lib/systemd/system-shutdown/tails should do that itself so something is wrong.
#13 Updated by intrigeri about 1 month ago
- Assignee changed from intrigeri to lamby
- % Done changed from 20 to 50
- QA Check set to Ready for QA
Fixed! All memory erasure tests pass locally. Please review. A build with my current proposal was just started on Jenkins. FTR: our review guidelines.
Note that I've left my first approach in the Git history. Its commit message explains why things are broken and need fixing. I was tempted to rewrite history and merge this explanation into the commit message that implements the approach that works, but this time I figured it would be useful to have a trace of something I've tried and that does not work, not particularly for posterity but rather for whoever would be tempted to try that in the future.
Thanks in advance :)
#14 Updated by intrigeri about 1 month ago
https://jenkins.tails.boum.org/view/Tails_ISO/job/test_Tails_ISO_bugfix-16097-memory-erasure-on-shutdown/3/cucumberTestReport/ confirms the fix (compared to job 2 where emergency shutdown tests failed).
#16 Updated by lamby about 1 month ago
- File tails-amd64-bugfix_16097-memory-erasure-on-shutdown-3.12-20190113T2209Z-4c051c657b.buildlog.xz added
- Assignee changed from lamby to intrigeri
- QA Check changed from Ready for QA to Pass
- Checked out
- Booted in QEMU:
- Logged in
- Shutdown from normal menu. Did not see any errors, delays or timeouts.
- Booted again.
- Shutdown from greeter. Did not see any errors, delays or timeouts.
- Burnt to USB stick
- Repeated above on X230.