Project

General

Profile

Bug #16389

Feature #15292: Distribute a USB image

Feature #15293: Creating & preparing the disk image

Bug #15992: Post-release bugfixing for creating & preparing the disk image

Some USB sticks become unbootable in legacy BIOS mode after first boot

Added by intrigeri 26 days ago. Updated 14 days ago.

Status:
In Progress
Priority:
High
Assignee:
Category:
Installation
Target version:
Start date:
01/25/2019
Due date:
% Done:

20%

QA Check:
Dev Needed
Feature Branch:
bugfix/16389-recompute-chs
Type of work:
Code
Blueprint:
Starter:
Affected tool:

Description

As per "[Tails-testers] tails 3.12rc1 becomes unbootable on bios after first use" and "[Tails-testers] Testing the 3.12 image". OPs are and . The latter "fixed" the problem by opening the drive in gdisk and rebuilding the protective MBR; they also shared an image of the MBR + GPT header of the broken stick + the diff between that one and the fixed one.

gpt.diff View (150 Bytes) intrigeri, 01/25/2019 08:30 AM

gpt.img (17 KB) intrigeri, 01/25/2019 08:30 AM


Related issues

Related to Tails - Feature #16397: Write release notes for 3.12 Resolved 01/29/2019

Associated revisions

Revision 6d15ad06 (diff)
Added by intrigeri 25 days ago

Recompute CHS values for the hybrid MBR after first-boot repartitioning (refs: #16389)

Some legacy BIOS systems won't boot otherwise.

Revision 80af70ce (diff)
Added by intrigeri 9 days ago

Recompute CHS values for the hybrid MBR after first-boot repartitioning (refs: #16389)

Some legacy BIOS systems won't boot otherwise.

History

#1 Updated by intrigeri 26 days ago

Attaching the aforementioned MBR+GPT header image + diff.

#2 Updated by intrigeri 26 days ago

Dear segfault,

Context: we need a fix ready, tested, reviewed and merged by the end of the week. That's going to be intense, especially if none of us manage to reproduce the problem (I guess our MBR+GPT headers are affected just as well as the OP's, but for some reason they boot anyway on all the hardware we've tested). I'm confident the OPs will be happy to test a tentative fix, but that'll likely take some time, so the earlier we have a tentative fix, the better.

I can make time for this over the week-end, be it to work on a fix alone, sprinting towards a fix with you, or reviewing. I'd like to know by tomorrow morning to what extent I need to change my week-end plans though. Thanks in advance!

#3 Updated by intrigeri 25 days ago

  • Assignee changed from segfault to intrigeri

I'll start working on it now. If you can help or take over, cool: then let's talk on XMPP :)

#4 Updated by intrigeri 25 days ago

The bytes affected by the diff sent by the OP are all about the partition entry for the system partition in the hybrid MBR. It's not 100% clear to me whether it's about the partition type or the CHS of the last sector: the diff and the gpt.img don't match, I suspect some byte ordering / endianness mismatch.

If it's about the CHS of the last sector, then the size of the system partition matters, and thus the size of the USB stick matters. So I'll be testing with a 32GB USB stick, that has more chances than my other (8GB) ones to display the problem.

I've compared the 1st sector of the same USB stick a) installed with Tails Installer; b) installed with the USB image in GNOME Disks and then booted once; the MBR partition entry for the 2 resulting system partitions is identical. So I would assume that a given legacy BIOS system should equally succeed, or fail, to boot both. In this sense, the problem this ticket is about might not be a regression.

#5 Updated by intrigeri 25 days ago

  • Status changed from Confirmed to In Progress

#6 Updated by intrigeri 25 days ago

  • % Done changed from 0 to 10
  • Feature Branch set to bugfix/16389-recompute-chs

The code added in this branch will apply, on first boot, part of the changes present in the provided gpt.diff, but not all of them. Hopefully that will be sufficient to fix boot on their system.

I'm starting to hope that the fix I'm preparing will not only fix this problem, but more generally fix a bunch of "Tails installed with Tails Installer does not start in legacy BIOS mode on $computer XYZ" (we have quite a few of these known issues documented), which could actually be instances of this exact problem. And my main fear is that this fix breaks stuff for systems that previously booted fine…

#7 Updated by intrigeri 25 days ago

Asked the 2 OPs to test the fix and on top of that, sent a broader call for testing to check whether this fix breaks Tails startup on other computers.

#8 Updated by intrigeri 25 days ago

  • % Done changed from 10 to 20

(Partial) test suite passed on our shared Jenkins and on my local one. USB sticks installed from a USB image built from this branch, onto a 32GB USB stick, boot fine twice on the two spare laptops I have around.

I'll wait until some point tomorrow for reports from my call for testing and then I'll decide if this is worth the risk: if I have a confirmation that this does fix the problem for the OPs, I might be tempted to say it's worth it, even if I get little confirmations that this does not break anything (on top of my own testing). We'll see.

#9 Updated by anonym 24 days ago

I have tested an image built from this branch using four different USB sticks on for four laptops (all fairly different from each other except they all use Intel CPUs), and saw no regressions.

#10 Updated by intrigeri 24 days ago

  • Assignee changed from intrigeri to anonym
  • QA Check set to Ready for QA

The 2 OPs did not reply yet, so I can't say for sure that this fixes the problem they reported; I bet it does. Very few people tested the new image but they all confirmed it does not break stuff for them. So I'm really unsure what's best: we can merge this, and risk breaking systems that booted just fine during the USB image call for testing and/or on 3.12~rc1; or we can postpone, and risk discovering that the problem reported by these 2 people actually affects way more users (whose usual installation method is not supported/documented anymore).

I'm leaning towards merging, so here is a PR. But I would understand if you decide to take the other risk over this one.

#11 Updated by segfault 23 days ago

Hi,

I'm unsure about merging this. IIUC, the man page of sgdisk says that --recompute-chs sets a CHS value that violates GPT specification. This sounds like it could break booting on some systems. Could we maybe do another call for testing and then merge this in 3.13? Only two of many testers reported this problem, so I'm quite confident not too many users will be affected.

@intrigeri: I wrote you an email explaining why I went AWOL. Sorry again and thanks so much for stepping in!

#12 Updated by intrigeri 23 days ago

I'm unsure about merging this. IIUC, the man page of sgdisk says that --recompute-chs sets a CHS value that violates GPT specification. This sounds like it could break booting on some systems. Could we maybe do another call for testing and then merge this in 3.13? Only two of many testers reported this problem, so I'm quite confident not too many users will be affected.

Wrt. "Only two of many testers reported this problem", it could because most people who answered our USB image call for testing started it only once, which would not expose the bug. And then only few people installed 3.12~rc1 from scratch to use it in production (and thus more than once): I expect most people who are ready to use a RC in production upgraded an existing stick, which again would not expose the bug. But yeah, I hear your concerns.

My best bet/guess at this point, given #16389#note-4, is that without this branch merged, the users who will face a regression are those who satisfy these two conditions:

  • Their hardware is affected by the problem. Risk: probably low, because I currently think that a Tails installed by Tails Installer would have the exact same problem, so if it was this widespread, I hope we would have noticed earlier.
  • They were previously using an installation method that is not affected, i.e. users who've been creating an "intermediate" Tails (from macOS, Windows, all Linux except Debian) and run it forever, without following the Installation Assistant last steps that instruct to create a "final" Tails with Tails Installer. I don't know how widespread such practice is.

So yeah, it feels relatively safe to skip merging this.

Either way, we should be ready to deal with the fallout if things go wrong, that is if we don't merge:

  • Explain to help desk & technical writers what are the symptoms they should notice, and what they should tell affected users, e.g.:
    • How to manually fix their broken 3.12 USB stick.
    • How to try the proposed branch and check if it fixes the bug for them.
  • If the problem affects a significant number of users:
    • Update the call for testing I've sent and give it a higher profile (blog post, Twitter?).
    • Be ready to release an emergency Tails 3.12.1 a couple weeks max after 3.12 (i.e. ASAP after collecting data to gain confidence the candidate fix does not break more than it fixes).

And if we do merge, the list of work items is similar but not quite the same.

I'm not able to lead this work (I'll be AFK for a week after 3.12). It should probably be a USB image team effort.

#13 Updated by intrigeri 23 days ago

  • Assignee changed from anonym to segfault
  • QA Check changed from Ready for QA to Dev Needed

Please provide the bits our tech writers (for release notes) and help desk will need. anonym suggests the easiest way for affected users is to reinstall, boot once, set an admin password, and then run the magic command (assuming that works from inside a running Tails, this needs to be tested).

#14 Updated by segfault 23 days ago

intrigeri wrote:

Please provide the bits our tech writers (for release notes) and help desk will need. anonym suggests the easiest way for affected users is to reinstall, boot once, set an admin password, and then run the magic command (assuming that works from inside a running Tails, this needs to be tested).

OK, I will try to do that tonight or tomorrow

#15 Updated by u 22 days ago

Hi segfault: as the release is supposed to be today, it would be super cool if you could get to that as early as possible :)

#16 Updated by u 22 days ago

Made release writers and help desk aware of the issue by email.

#17 Updated by u 22 days ago

#18 Updated by u 22 days ago

#19 Updated by u 22 days ago

#20 Updated by u 22 days ago

anonym suggests the easiest way for affected users is to reinstall, boot once, set an admin password, and then run the magic command (assuming that works from inside a running Tails, this needs to be tested).

magic command is `sudo sgdisk --recompute-chs /dev/sda`

#21 Updated by intrigeri 22 days ago

Better use sudo sgdisk --recompute-chs /dev/bilibop, that avoids the need to find out the name of the boot device: /dev/bilibop should always be a symlink to it.

#22 Updated by anonym 21 days ago

  • Target version changed from Tails_3.12 to Tails_3.13

#23 Updated by intrigeri 15 days ago

One of the OPs (Phredo) reported that sgdisk --recompute-chs /dev/bilibop fixed the problem for them.

#24 Updated by mercedes508 14 days ago

For the record, since 3.12, we didn't receive report about this bug.

Also available in: Atom PDF