Project

General

Profile

Bug #10720

Bug #10988: Tails Installer workarounds for UDisks 2 bugs are not robust enough

Tails Installer freezes when calling system_partition.call_set_name_sync in partition_device

Added by intrigeri over 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Elevated
Assignee:
-
Category:
Installation
Target version:
Start date:
12/07/2015
Due date:
% Done:

100%

Feature Branch:
bugfix/10720-installer-freezes-on-jenkins
Type of work:
Code
Blueprint:
Starter:
Affected tool:
Installer

Description

This ticket is now superseded by #11590 and #11588.

As reported on #10717#note-1:

The root cause might live in UDisks, in QEMU, or in the Linux kernel, but realistically our best option is probably to act as if it was merely a race condition in UDisks, and add yet another workaround in Tails Installer. Doing it on the 4.x branch should be enough.

01_15_59_Installing_Tails_to_a_pristine_USB_drive.png View (87.7 KB) intrigeri, 07/20/2016 01:42 AM


Related issues

Related to Tails - Bug #10717: Concerning amount of test suite runs aborted on Jenkins due to timeout Rejected 12/06/2015
Related to Tails - Bug #9691: Tails Installer has to workaround race conditions in UDisks2 Resolved 07/05/2015
Blocked by Tails - Bug #10907: usb_install.feature fails when run as part of the entire test suite Resolved 01/12/2016

Associated revisions

Revision 0163c64d (diff)
Added by intrigeri over 3 years ago

Mark as fragile all tests that rely on Tails Installer.

Refs: #10720

Revision d40505b3 (diff)
Added by intrigeri over 3 years ago

Test suite: send Tails Installer's debug log to the Cucumber debug log on failure.

This is meant to debug refs: #10720 since I can't reproduce it locally.

Revision 1b84fbcf (diff)
Added by intrigeri over 3 years ago

Test suite: run usb_{install,upgrade}.feature first.

This is meant to debug the strange behaviour I see on Jenkins
(refs: #10720).

Revision 718d76f9 (diff)
Added by intrigeri over 3 years ago

Test suite: mark "Scenario: Tails shuts down on USB boot medium removal" as fragile.

... for the same reason as all scenarios that use Tails Installer.

refs: #10720

Revision bfb67b1e (diff)
Added by intrigeri over 3 years ago

Revert "Test suite: mark "Scenario: Tails shuts down on USB boot medium removal" as fragile."

This reverts commit 718d76f97eafe2a42b76578ece891fba0a47c2a8.

refs: #10720

Revision 5107c485 (diff)
Added by anonym about 3 years ago

Pin tails-installer version with the potential fix for #10720.

There's been tails-intsaller releases since then, so they get
installed instead. They only contain new translations, dependency
fixes, and other things irrelevant for #10720.

Refs: #10720

Revision 3a243660 (diff)
Added by intrigeri almost 3 years ago

Fix pinning for tails-installer experimental version.

Apparently one cannot combine conditions the way it was done in
commit 5107c48.

refs: #10720

Revision 34edae24 (diff)
Added by intrigeri almost 3 years ago

Bump APT pinning to install newer tails-installer WIP package.

refs: #10720

Revision 17844a84 (diff)
Added by intrigeri almost 3 years ago

Bump APT pinning to install newer tails-installer WIP package.

refs: #10720

Revision 1cc7209b (diff)
Added by intrigeri almost 3 years ago

Bump APT pinning to install newer tails-installer WIP package.

refs: #10720

Revision 590ff198 (diff)
Added by intrigeri almost 3 years ago

Bump APT pinning to install newer tails-installer WIP package.

refs: #10720

Revision 96ca12b9 (diff)
Added by intrigeri almost 3 years ago

Unmark all scenarios that use Tails Installer as fragile.

bugfix/11590-installer-robustness was based on this branch
(bugfix/10720-installer-freezes-on-jenkins), and flagged the tests as
fragile again, so we have to revert that on this branch so these tests
run on Jenkins again.

refs: #10720

This reverts the following commits:

3c02ea74788cffe0b4db21181f46778eab0689b5
20d07239a8e2c3706827edcf3b17266899c37a64
2c4c6434c0c6bdcf2534f966434bf92230af3698

History

#1 Updated by intrigeri over 3 years ago

  • Related to Bug #10717: Concerning amount of test suite runs aborted on Jenkins due to timeout added

#2 Updated by intrigeri over 3 years ago

  • Related to Bug #9691: Tails Installer has to workaround race conditions in UDisks2 added

#3 Updated by intrigeri over 3 years ago

  • Feature Branch set to bugfix/10720-installer-freezes-on-jenkins

#4 Updated by intrigeri over 3 years ago

I think I should tell the test suite to run the Installer with DEBUG=1, and to gather the debug log as part of the Jenkins artifacts somehow.

#5 Updated by intrigeri over 3 years ago

  • Status changed from Confirmed to In Progress

#6 Updated by intrigeri over 3 years ago

intrigeri wrote:

I think I should tell the test suite to run the Installer with DEBUG=1, and to gather the debug log as part of the Jenkins artifacts somehow.

Done, so next runs of https://jenkins.tails.boum.org/view/Raw/job/test_Tails_ISO_bugfix-10720-installer-freezes-on-jenkins/ should hopefully tell me more about this problem :)

#7 Updated by intrigeri over 3 years ago

Strangely, I don't see this failure anymore. But all recent builds fail "Scenario: Booting Tails from a USB drive without a persistent partition and creating one" and the next scenarios that use the "I have started Tails without network from a USB drive without a persistent partition and stopped at Tails Greeter's login screen" snapshot: the video only shows "No bootable device". What's interesting is that "I can view and print a PDF file stored in persistent /home/amnesia/Persistent" and "Watching MP4 videos stored on the persistent volume should work as expected given our AppArmor confinement" pass, while they use a snapshot that is a child of the one that we fail to restore. I suspect there is a problem with the platform, possibly triggered only when we run the entire test suite at once.

#8 Updated by intrigeri over 3 years ago

Next debugging step: reorder features to run usb_install.feature first.

#9 Updated by intrigeri over 3 years ago

intrigeri wrote:

Next debugging step: reorder features to run usb_install.feature first.

Done in the topic branch, and then the USB install tests passed for the first time in a while, while a couple other persistence -using tests started failing again like they used to fail a few weeks ago, before these "No bootable device" issues started to show up.

#10 Updated by intrigeri over 3 years ago

  • Blocked by Bug #10907: usb_install.feature fails when run as part of the entire test suite added

#11 Updated by intrigeri over 3 years ago

  • Target version changed from Tails_2.0 to Tails_2.2

I still don't see this bug anymore. I'll look at the results again during the 2.2 cycle.

#13 Updated by intrigeri over 3 years ago

The line that triggers this error is: system_partition.call_set_name_sync(self.label, GLib.Variant('a{sv}', None))

#14 Updated by intrigeri over 3 years ago

  • Related to Bug #10987: Tails Installer sometimes fails with: No support for modifying a partition a table of type `PMBR' added

#15 Updated by intrigeri over 3 years ago

The line that triggers this error is: [...]

I've asked Alan, who added this line in a commit that didn't really document why, if he had any idea what it is useful for.

#16 Updated by intrigeri over 3 years ago

  • Parent task set to #10988

#17 Updated by intrigeri over 3 years ago

  • Subject changed from Tails Installer freezes on Jenkins to Tails Installer freezes in partition_device on Jenkins

#18 Updated by intrigeri over 3 years ago

  • Related to deleted (Bug #10987: Tails Installer sometimes fails with: No support for modifying a partition a table of type `PMBR')

#19 Updated by intrigeri over 3 years ago

  • Subject changed from Tails Installer freezes in partition_device on Jenkins to Tails Installer freezes when calling system_partition.call_set_name_sync in partition_device

#20 Updated by intrigeri over 3 years ago

The affected code is:

        # XXX: sometimes fails (https://labs.riseup.net/code/issues/10987)
        system_partition.call_set_type_sync(ESP_GUID, GLib.Variant('a{sv}', None))
        # XXX: sometimes fails (https://labs.riseup.net/code/issues/10720)
        system_partition.call_set_name_sync(self.label, GLib.Variant('a{sv}', None))

... and given the error message, I wonder if we need to wait for something between these two statements.

#21 Updated by intrigeri over 3 years ago

intrigeri wrote:

The line that triggers this error is: [...]

I've asked Alan, who added this line in a commit that didn't really document why, if he had any idea what it is useful for.

... and he tells me that this line was added just in case it might be useful, but according to him, after reading udisks code it doesn't seem to be useful. So I think I'll remove that line, and upload a snapshot package to the topic branch's APT suite, so we can see on Jenkins if it helps or not.

#22 Updated by intrigeri over 3 years ago

  • Target version changed from Tails_2.2 to Tails_2.3

#23 Updated by intrigeri over 3 years ago

  • Target version changed from Tails_2.3 to Tails_2.4

#24 Updated by intrigeri about 3 years ago

  • Target version changed from Tails_2.4 to Tails_2.5

#25 Updated by anonym about 3 years ago

I've refreshed the feature branch. I also pinned the fixed tails-installer version so we actually test this with intrigeri's fix, so we should revert 5107c485ff29e48bf983c1432e84089ed706e6ab before potentially merging this branch.

#26 Updated by BitingBird about 3 years ago

  • % Done changed from 0 to 40

#27 Updated by intrigeri about 3 years ago

anonym wrote:

I've refreshed the feature branch. I also pinned the fixed tails-installer version so we actually test this with intrigeri's fix, so we should revert 5107c485ff29e48bf983c1432e84089ed706e6ab before potentially merging this branch.

Tried to fix that pinning with 3a243660ed3dc67b389fe57b6ca621807ef0f156, that should therefore be reverted as well.

#28 Updated by intrigeri almost 3 years ago

So I think I'll remove that line, and upload a snapshot package to the topic branch's APT suite, so we can see on Jenkins if it helps or not.

With the system_partition.call_set_name_sync call removed, the next statement (_set_partition_flags) fails (see attached screenshot). It's interesting that this one can fail after call_set_type_sync has worked.

Next steps:

  • sync/settle/wait and retrieve a fresh partition object after call_set_type_sync;
  • fix my tweak to include Tails Installer's debug log in the test suite's, as it does not work.

#29 Updated by intrigeri almost 3 years ago

Next steps:

  • sync/settle/wait and retrieve a fresh partition object after call_set_type_sync;

Done in 4.4.10+dfsg-0tails1+bugfix.10720~1.gbp1473f2, building on Jenkins.

  • fix my tweak to include Tails Installer's debug log in the test suite's, as it does not work.

Still the case.

#30 Updated by intrigeri almost 3 years ago

intrigeri wrote:

  • sync/settle/wait and retrieve a fresh partition object after call_set_type_sync;

Done in 4.4.10+dfsg-0tails1+bugfix.10720~1.gbp1473f2, building on Jenkins.

This makes partition_device exit successfully, but then switch_drive_to_system_partition is called, and calls _set_drive, which fails with "Cannot find device /dev/sda1", that happens whenever self.drives.has_key(drive) is false.

#31 Updated by intrigeri almost 3 years ago

  • fix my tweak to include Tails Installer's debug log in the test suite's, as it does not work.

Done in 4.4.10+dfsg-0tails1+bugfix.10720~3.gbpe9be10.

#32 Updated by intrigeri almost 3 years ago

  • Related to Bug #11582: Some upgrade test scenarios fail due to lack of disk space on Jenkins added

#33 Updated by intrigeri almost 3 years ago

  • Related to deleted (Bug #11582: Some upgrade test scenarios fail due to lack of disk space on Jenkins)

#34 Updated by intrigeri almost 3 years ago

  • Blocks Bug #11582: Some upgrade test scenarios fail due to lack of disk space on Jenkins added

#35 Updated by intrigeri almost 3 years ago

Status: I've seen a couple successful test suite runs from that branch on Jenkins! I'd like to get the installer robustness improvements produced here into 2.6, so that they benefit human users even though they might not be good enough to mark the tests as non-fragile on Jenkins yet.

#36 Updated by intrigeri almost 3 years ago

  • Blocks deleted (Bug #11582: Some upgrade test scenarios fail due to lack of disk space on Jenkins)

#37 Updated by intrigeri almost 3 years ago

  • Target version changed from Tails_2.5 to Tails_2.6
  • % Done changed from 40 to 50

intrigeri wrote:

Status: I've seen a couple successful test suite runs from that branch on Jenkins! I'd like to get the installer robustness improvements produced here into 2.6, so that they benefit human users even though they might not be good enough to mark the tests as non-fragile on Jenkins yet.

Merging most of these bits is now tracked as #11590. I have not seen Tails Installer fail on Jenkins yet with these changes in, so technically this ticket should be closed at some point. I'll do that once we have new tickets tracking the next set of robustness issues, that are unveiled now that the Installer works (and then the comments about the #10720 fragile tags in *.feature will need an update). I'll wait to have a bit more data before I start creating these tickets.

#38 Updated by intrigeri almost 3 years ago

  • Blocks Bug #11588: Sometimes fails to boot from USB on Jenkins with I/O errors added

#39 Updated by intrigeri almost 3 years ago

  • Blocks deleted (Bug #11588: Sometimes fails to boot from USB on Jenkins with I/O errors)

#40 Updated by intrigeri almost 3 years ago

  • Description updated (diff)
  • Status changed from In Progress to Resolved
  • Target version changed from Tails_2.6 to Tails_2.5
  • % Done changed from 50 to 100

This ticket is now superseded by #11590 and #11588.

#41 Updated by intrigeri almost 3 years ago

  • Assignee deleted (intrigeri)

Also available in: Atom PDF