Project

General

Profile

Bug #11729

Emergency shutdown triggered after resuming from suspend

Added by segfault over 3 years ago. Updated 7 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
08/26/2016
Due date:
% Done:

100%

Feature Branch:
feature/14556-show-suspend-button
Type of work:
Code
Blueprint:
Starter:
Affected tool:

Description

Sometimes Tails will immediately shut down after resuming from suspend.


Related issues

Related to Tails - Feature #14556: Show a suspend to RAM button in the status menu Resolved 08/30/2017
Related to Tails - Bug #16787: Trigger emergency shutdown on resume when the boot device is missing Confirmed

Associated revisions

Revision 0f9a9a04 (diff)
Added by segfault 9 months ago

Disable emergency shutdown during suspend (refs: #11729)

Revision d6b6302e (diff)
Added by segfault 9 months ago

Disable emergency shutdown during suspend (refs: #11729)

Revision e4e7f812 (diff)
Added by segfault 9 months ago

Disable emergency shutdown during suspend (refs: #11729)

Revision 268474a4 (diff)
Added by segfault 9 months ago

Document that emergency shutdown is disabled during suspend (refs: #11729)

Revision a7c35d0f (diff)
Added by segfault 9 months ago

Document that emergency shutdown is disabled during suspend (refs: #11729)

Revision 8f4bf911
Added by intrigeri 9 months ago

Merge remote-tracking branch 'origin/feature/14556-show-suspend-button' into stable (Fix-committed: #11729, #14556)

Revision 6ea13262 (diff)
Added by intrigeri 9 months ago

Design doc: add link to implementation (refs: #11729)

History

#2 Updated by segfault over 3 years ago

This happens to me once in a while. I think this might be a blocker for #11395, because it could lead to loss of unsaved work.

#3 Updated by intrigeri over 3 years ago

  • Target version deleted (2017)

(Generally, our roadmap should reflect collective decisions. Now, sometimes some of us add to it stuff that they intend to do themselves, which is fine. But adding to the roadmap without any assignee feels wrong to me.)

#4 Updated by elouann over 3 years ago

  • Status changed from New to Confirmed

Confirmed by several reports and by myself: after suspend, Tails shuts down when running from USB stick but not when running from SD card, on a Thinkpad X201

#5 Updated by intrigeri over 3 years ago

At this point, I feel the need to point out that we started polishing the support of suspend-to-RAM on the grounds that it was already mostly working and could therefore be advertised a bit better. The previous status quo was that developers (at least myself) thought that suspend-to-RAM did not work and was not supported, but power users started using it and relying on it. And now we're discovering serious issues with this feature, one after the other, some of them being potentially hard to fix. So I'd like us to take a step back and double-check that it still seems reasonable to add this feature on the list of things we need to make work nicely and keep working in the foreseeable future. At least I need to know if "we" is "the Foundations Team" in this case; I'm not convinced it should be.

#6 Updated by anonym over 3 years ago

I agree with intrigeri's concern, but I'm just gonna drop a suggestion for how we can fix this issue: we simply hook into the suspend action to kill the udev watchdog, and restart it when we resume. This assumes that the cause of this issue is that some hardware detaches the USB drive on suspend, and reattaches it on resume, and that that doesn't cause other issues.

#9 Updated by sajolida over 3 years ago

  • Assignee set to intrigeri
  • QA Check set to Info Needed

Yeap, we shouldn't allow users to suspend from the desktop if resuming shuts down their session.

intrigeri: could you be more explicit regarding "discovering serious issues [...] one after the other". Which issues other than this one are you referring to? Maybe we should add Redmine relationships or discuss this outside of this ticket if these issues go beyond the scope of this ticket only or if the discussion gets complicated.

#10 Updated by intrigeri over 3 years ago

  • Assignee changed from intrigeri to sajolida

intrigeri: could you be more explicit regarding "discovering serious issues [...] one after the other". Which issues other than this one are you referring to?

In retrospect, I think I was erroneously mixing together things related to locking screen and things related to suspend. Sorry for the confusion.

#11 Updated by sajolida about 3 years ago

  • Assignee deleted (sajolida)
  • QA Check deleted (Info Needed)

#12 Updated by u over 2 years ago

  • Assignee set to elouann
  • QA Check set to Info Needed

Is this still the case with 3.0?

#13 Updated by intrigeri over 2 years ago

  • Related to Feature #14556: Show a suspend to RAM button in the status menu added

#14 Updated by mercedes508 almost 2 years ago

  • Assignee changed from elouann to mercedes508

I'll try to reproduce with Tails 3.4.

#15 Updated by mercedes508 almost 2 years ago

  • Status changed from Confirmed to Resolved
  • Assignee deleted (mercedes508)

can't reproduce with Tails 3.4, closing it.

#16 Updated by mercedes508 almost 2 years ago

  • Status changed from Resolved to In Progress
  • Assignee set to segfault

Do you still experience this with 3.4?

#17 Updated by segfault almost 2 years ago

  • Assignee changed from segfault to mercedes508

Do you still experience this with 3.4?

This didn't happen every time, but only now and then. And I don't use Tails regularly enough currently to tell if this is fixed or not. Maybe we can close it if there are no bug reports since Tails 3.0.

#18 Updated by mercedes508 almost 2 years ago

  • Status changed from In Progress to Resolved

segfault wrote:

Do you still experience this with 3.4?

This didn't happen every time, but only now and then. And I don't use Tails regularly enough currently to tell if this is fixed or not. Maybe we can close it if there are no bug reports since Tails 3.0.

Marking it as resolved accordingly.

#19 Updated by segfault 9 months ago

  • Status changed from Resolved to Confirmed
  • Assignee changed from mercedes508 to segfault
  • QA Check deleted (Info Needed)

I can still reproduce this issue. On my test hardware, it happens in about 1 out of 3 times I resume from suspend. It prints the following log messages:

brcmsmac bcma0:1: brcms_ops_bss_info_changed: qos enabled: false (implement)
brcmsmac bcma0:1: brcms_ops_config: change power-save mode: false (implement)
brcmsmac bcma0:1: brcms_ops_bss_info_changed: qos enabled: false (implement)
brcmsmac bcma0:1: brcms_ops_config: change power-save mode: false (implement)
systemd-udevd: Process 'lmt-udev force' terminated by signal TERM.
systemd-udevd: Failed to wait for spawned command 'lmt-udev force': Input/output error

Followed by a lot of squashfs error messages:

SQUASHFS error: Unable to read data cache entry
SQUASHFS error: Unable to read page, block X, size Y

#20 Updated by segfault 9 months ago

This seems to be indeed caused by our udev-watchdog-wrapper, because I don't see this issue after stopping the watchdog, I suspended and resumed 10 times in a row now.

Next step: Try to stop the watchdog before suspending (and restarting it afterwards), as already suggested by anonym in #note-6.

#21 Updated by segfault 9 months ago

So we could use systemd's inhibitor locks for this:

https://www.freedesktop.org/wiki/Software/systemd/inhibit/

This would certainly make the suspend button in the system menu work without causing this issue. But the documentation includes this note:

Note that this will only be sent out for suspend/resume cycles done via logind, i.e. generally only for high-level user-induced suspend cycles, and not automatic, low-level kernel induced ones which might exist on certain devices with more aggressive power management.

Now I'm wondering whether we could hook into somewhere more low level to also prevent this issue for these "devices with more aggressive power management". I will do some more research.

#22 Updated by intrigeri 9 months ago

Now I'm wondering whether we could hook into somewhere more low level to also prevent this issue for these "devices with more aggressive power management".

AFAIK computers that would auto-suspend via low-level (driver, hardware) means are basically smartphones and possibly some tablets. While it's great that the systemd developers and doc are so forward-thinking, at this point I don't think Tails runs on any such device in a remotely usable way. (FWIW, I'm testing Tails on a tablet regularly and despite continuous improvements, it's not remotely usable yet; I've never seen that device auto-suspend anyway.) So I would suggest you don't need to do more research :)

#23 Updated by segfault 9 months ago

The kernel supports registering suspend notifier functions which are then called before the suspend, see register_pm_notifier in https://github.com/torvalds/linux/blob/master/Documentation/driver-api/pm/notifiers.rst.

Unfortunately, I can't find a kernel module which uses these notifiers and exposes them to the user space.

logind seems not to make use of this kernel functionality, it only sends the PrepareForSleep signal when the suspend is triggered via logind itself: https://github.com/systemd/systemd/blob/e62a7fea757f259eb330da5b6d3ab4ede46400a2/src/login/logind-dbus.c#L1659

#24 Updated by segfault 9 months ago

intrigeri wrote:

Now I'm wondering whether we could hook into somewhere more low level to also prevent this issue for these "devices with more aggressive power management".

AFAIK computers that would auto-suspend via low-level (driver, hardware) means are basically smartphones and possibly some tablets. While it's great that the systemd developers and doc are so forward-thinking, at this point I don't think Tails runs on any such device in a remotely usable way. (FWIW, I'm testing Tails on a tablet regularly and despite continuous improvements, it's not remotely usable yet; I've never seen that device auto-suspend anyway.) So I would suggest you don't need to do more research :)

Ah OK, I thought there might also be laptops which do this. Well then I would just go on and implement it the systemd way.

#25 Updated by segfault 9 months ago

Well then I would just go on and implement it the systemd way.

I think the right place to do this is in the tails-shutdown-on-media-removal.service / udev-watchdog-wrapper itself, which should just stop watching for the udev signals during suspend. But I would very much like to do this in Python instead of bash, so I'm tempted to rewrite the udev-watchdog-wrapper in Python (this would also allow us to use udisks via the D-Bus API instead of parsing udevadm output).

Alternatively, I could write a udev-watchdog-wrapper-wrapper which simply kills / restarts the udev-watchdog-wrapper before / after suspend.

#26 Updated by intrigeri 9 months ago

I think the right place to do this is in the tails-shutdown-on-media-removal.service / udev-watchdog-wrapper itself, which should just stop watching for the udev signals during suspend.

Another implementation idea would be to use sleep.target, somehow, to toggle on/off its reaction to udev events. For example:

  • a /lib/systemd/system-sleep/udev-watchdog script (systemd-sleep(8)):
    • if passed "pre", runs touch /run/something-clever/sleeping
    • if passed "post", runs rm /run/something-clever/sleeping
  • udev-watchdog does not react to udev events if /run/something-clever/sleeping exists

This should guarantee that udev events are not acted upon by udev-watchdog while suspending or resuming. It's less nice than some other options, such as reacting to D-Bus stuff sent by logind but it probably requires much less work to get it right, and avoids the need to rewrite code into a different language.

But I would very much like to do this in Python instead of bash,

Indeed, I'm not looking forward to reviewing code that takes an inhibitor lock via the logind D-Bus API in shell.

so I'm tempted to rewrite the udev-watchdog-wrapper in Python (this would also allow us to use udisks via the D-Bus API instead of parsing udevadm output).

Why not, but keep in mind that this is a persistent daemon and Python will eat more RAM. (I guess this would be a perfect use case for Go or Rust, in passing.)

#27 Updated by segfault 9 months ago

  • Status changed from Confirmed to In Progress

#28 Updated by segfault 9 months ago

  • Assignee deleted (segfault)
  • QA Check set to Ready for QA

a /lib/systemd/system-sleep/udev-watchdog script (systemd-sleep(8)):

that's a great idea, thanks for the hint to systemd-sleep that makes things so much simpler :)

I put a trivial script in /lib/systemd/system-sleep which simply stops the tails-shutdown-on-media-removal.service before suspend and starts it again. I tested it and it seems to work.

so I'm tempted to rewrite the udev-watchdog-wrapper in Python (this would also allow us to use udisks via the D-Bus API instead of parsing udevadm output).

Why not, but keep in mind that this is a persistent daemon and Python will eat more RAM. (I guess this would be a perfect use case for Go or Rust, in passing.)

Indeed, Go or Rust would be better than Python for a daemon process. I could rewrite it in Go at some point, but I think the system-sleep script already solves the problem at hand.

#29 Updated by segfault 9 months ago

  • Feature Branch set to feature/14556-show-suspend-button

#30 Updated by intrigeri 9 months ago

  • Assignee set to segfault
  • Target version set to Tails_3.14
  • QA Check changed from Ready for QA to Dev Needed

Regarding /lib/systemd/system-sleep/stop-tails-shutdown-on-media-removal.sh:

  • It's worth mentioning it in wiki/src/contribute/design/memory_erasure.mdwn
  • I suggest s/stop/toggle/ in the script's name, to better express what it does.

Other than this, @segfault: code review passes and it works fine on a X200.

#31 Updated by segfault 9 months ago

  • Assignee changed from segfault to intrigeri
  • QA Check changed from Dev Needed to Ready for QA

intrigeri wrote:

Regarding /lib/systemd/system-sleep/stop-tails-shutdown-on-media-removal.sh:

  • It's worth mentioning it in wiki/src/contribute/design/memory_erasure.mdwn
  • I suggest s/stop/toggle/ in the script's name, to better express what it does.

Done

#32 Updated by intrigeri 9 months ago

Added a link in the design doc to the implementation, code review passes, will now test.

#33 Updated by intrigeri 9 months ago

  • Status changed from In Progress to 11
  • % Done changed from 0 to 100

#34 Updated by intrigeri 9 months ago

  • Status changed from 11 to In Progress

#35 Updated by intrigeri 9 months ago

  • Status changed from In Progress to 11
  • Assignee deleted (intrigeri)
  • QA Check changed from Ready for QA to Pass

#36 Updated by intrigeri 7 months ago

  • Target version changed from Tails_3.14 to Tails_3.13.2

#37 Updated by anonym 7 months ago

  • Status changed from 11 to Resolved

#38 Updated by anonym 7 months ago

  • Target version changed from Tails_3.13.2 to Tails_3.14

#39 Updated by intrigeri 7 months ago

  • Target version changed from Tails_3.14 to Tails_3.13.2

#40 Updated by intrigeri 6 months ago

  • Related to Bug #16787: Trigger emergency shutdown on resume when the boot device is missing added

Also available in: Atom PDF