Detect SquashFS errors (bad medium or optical drives) while Tails is running
Usually, when the medium is in bad shape, or if the optical drive has issues reading it properly, the live system starts to respond in strange ways (e.g. application crashes or don't start at all).
When this happens the kernel issues messages like the following:
[ 55.816599] SQUASHFS error: xz_dec_run error, data probably corrupt [ 55.816603] SQUASHFS error: squashfs_read_data failed to read block 0x7e05a9b [ 55.816606] SQUASHFS error: Unable to read data cache entry [7e05a9b] [ 55.816608] SQUASHFS error: Unable to read page, block 7e05a9b, size 6cac
Let's add to the wishlist an application that would monitor for these messages, and display a warning notification in case it happens stating something like:
The system is having trouble reading Tails CD. Either the medium has defects or the optical drive is not able to read it properly.
Please try another CD or use Tails from a USB stick.
This is about code and should be pretty easy for someone that knows a bit of either Perl or Python and GTK+ (no need for deep knowledge of Tails internals).
Ideally, the files needed by this application should be closer to the start of the CD and (maybe) locked into memory. That's in the hope there is less chance it will be unreadable itself.
#5 Updated by elouann almost 4 years ago
I faced this problem again (Tails device became read-only)
Here is how the logs looked like:
Mar 15 11:12:43 amnesia kernel: EXT4-fs warning (device dm-0): ext4_end_bio:317: I/O error -5 writing to inode 1048582 (offset
Mar 15 11:12:43 amnesia kernel: buffer_io_error: 502 callbacks suppressed
Mar 15 11:12:43 amnesia kernel: Buffer I/O error on device dm-0, logical block 6343843
Mar 15 11:17:34 amnesia kernel: EXT4-fs error (device dm-0): ext4_wait_block_bitmap:494: comm icedove: Cannot read block bitma
Mar 15 11:17:34 amnesia kernel: EXT4-fs (dm-0): previous I/O error to superblock detected
Mar 15 11:17:34 amnesia kernel: Buffer I/O error on device dm-0, logical block 0
Mar 15 11:17:34 amnesia kernel: lost page write due to I/O error on dm-0
#8 Updated by intrigeri about 2 years ago
- Priority changed from Low to Normal
I doubt that we can do something about it -> lowering priority.
It's actually quite easy to do and many other live systems and installation media have something to address this problem (e.g. Fedora automatically does a full checksum during boot)… for a reason: it happened at least a few times each year that help desk forwards me a report about a weird bug, that's easy to diagnose once one notices these squashfs errors in the logs.
#9 Updated by sajolida over 1 year ago
- Subject changed from Detect bad medium or optical drives to Detect squashfs errors (bad medium or optical drives) while Tails is running
#11 Updated by intrigeri over 1 year ago
- Subject changed from Detect squashfs errors (bad medium or optical drives) while Tails is running to Detect SquashFS errors (bad medium or optical drives) while Tails is running
- Category set to Hardware support
I don't think GTK+ will fly here because such errors can prevent GDM from starting. At first glance I can think of 3 ways to do that:
- Check files integrity in early boot, like the Fedora Live CD does by default (they have 2 boot menu entries: "Start Fedora" and "Test this media & start Fedora", the 2nd one being the default, and then while the check is being done a "Press [Esc] to abort check" message is displayed above the progress percentage info). I don't know how hard it would be to integrate into Tails; it might be that live-boot or live-config supports this already. One major drawback is that this would slow down boot substantially which conflicts with our "Make it easier to switch between a Tails contextual identity and another identity outside of Tails" goal.
- Check files integrity continuously with
dm-verity(used by Android) or similar. I don't know how verification errors are reported to users.
- Handle this in
tails-gdm-failed-to-start.serviceas I'm going to propose on #16030. Drawback: if the boot medium is buggy/broken enough it might not even reach that point.