Bug #10288: Fix newly identified issues to make our test suite more robust and faster
"all notifications have disappeared" step is fragile when network is unplugged
It relies on the fact that the notification applet is there. When we boot with network plugged, the (persistent) Tor is ready has to be here so we're fine. However, when booting with network unplugged, we can't rely on that, so instead we rely on subtle timing between the previous step ("the Tails desktop is ready", where we do more and more things) and the timeout of the "started in VM" notification. The changes brought to fix #11398 trigger this latent bug, that could hit us any time we add more stuff to "the Tails desktop is ready".
Test suite: fix TOCTOU race condition in "all notifications have disappeared" step.
The icon we're looking for may disappear between the time when we check if it's
there, and the time when we're clicking on it.
Test suite: fix the 'all notifications have disappeared' step.
#2 Updated by intrigeri over 3 years ago
- File 00_28_15_Booting_Tails_does_not_automount_untrusted_fat32_partitions.mkv added
- Priority changed from Normal to Elevated
There's one more cause for unreliability here: there's a TOCTU race condition in:
next if not(@screen.exists("GnomeNotificationApplet.png")) @screen.click("GnomeNotificationApplet.png") @screen.wait("GnomeNotificationAppletOpened.png", 10)
... that triggers:
[log] CLICK on (1007,762) And I start Tails from DVD with network unplugged and I login # features/step_definitions/common_steps.rb:193 FindFailed: can not find GnomeNotificationAppletOpened.png on the screen. Line ?, in File ? (RuntimeError) ./features/step_definitions/common_steps.rb:210:in `/^I start Tails( from DVD)?( with network unplugged)?( and I login)?$/' features/untrusted_partitions.feature:78:in `And I start Tails from DVD with network unplugged and I login'
Maybe we should try clicking the image in a block that catches exceptions, and do
next iff any exception is raised? Sounds like this would easily eliminate the race condition, no?
Here again, I can't really mark scenarios as fragile (yet) since so many are affected, so bumping priority a bit.
#3 Updated by intrigeri over 3 years ago
- Status changed from Confirmed to In Progress
- Assignee set to intrigeri
- Target version set to Tails_2.4
- % Done changed from 0 to 10
- Feature Branch set to test/11398-florence-hides-other-windows
Pushed a tentative fix to test/11398-florence-hides-other-windows, since I think that branch makes it more likely to hit such problems.
#6 Updated by intrigeri over 3 years ago
Even with my fix in, this still sometimes fails: https://jenkins.tails.boum.org/job/test_Tails_ISO_test-10376-fix-startup-page-roadmap-test-is-fragile/28/. I think this shows that Sikuli's
screen.click is racy, and not reliable when used on a fast-changing UI. I'll try to thing of a workaround, but really IMO we should try to find another solution (would dogtail be less racy?) to make the whole notifications thing more robust.
#7 Updated by intrigeri over 3 years ago
I think that another problem I've just fixed (eab5505) might have made this problem more likely to happen, because previously we were often running "all notifications have disappeared" twice, which at least doubles the chances it fails (I say "at least" because the 2nd call has more chances to be close to the time when the "running in a VM" notification times out, due to how the whole timing of our session initialization vs. test suite works).
#9 Updated by anonym over 3 years ago
- Status changed from In Progress to 11
- Assignee changed from anonym to intrigeri
- % Done changed from 50 to 100
Seems to be pretty robust now!
Indeed, which is the reason I merged it, but you might want to look at this commit I pushed on top:
761f3df You cannot `return` in a step definition block.
Is there anything we can do to cleanly deal with this (unlikely?) error case? IMHO we can just forget about this and revisit if we see issues -- AFAICT the real fix of this branch is the other stuff.
#11 Updated by intrigeri over 3 years ago
Indeed, which is the reason I merged it, but you might want to look at this commit I pushed on top:> 761f3df You cannot `return` in a step definition block. >
Indeed, right. IIRC I was not convinced either, but copied it from some other place that I can't find now, so probably I'm just wrong :)
Is there anything we can do to cleanly deal with this (unlikely?) error case?
Indeed we still have a TOCTOU problem here, but apparently it's less bad than the previous exists+click. I don't know. We could assume that a failure
screen.wait("GnomeNotificationAppletOpened.png", 10) means that the notification applet has disappeared, and then we need to unclick what we clicked, but it feels potentially fragile & premature optimization, so:
IMHO we can just forget about this and revisit if we see issues -- AFAICT the real fix of this branch is the other stuff.
#15 Updated by bertagaz over 3 years ago
- File devel_419.log View added
- File 00_02_43_Anti-test__no_memory_erasure_on_a_modern_computer.mkv added
- Status changed from Resolved to Confirmed
- Target version changed from Tails_2.4 to Tails_2.6
- % Done changed from 100 to 0
- QA Check changed from Pass to Dev Needed
Sadly we've seen this bug several time again while referencing the test suite failure in Jenkins for 2016 June (see #11087#note-9). Clearly Sikuli clicks too late on the notification area, which has already disappeared, and then ends up opening the workspace selection applet.
#16 Updated by Anonymous over 3 years ago
- Assignee set to intrigeri
While checking "Volunteers to handle important tickets flagged for next release, but without assignee" during the current contributor meeting we saw this ticket. Assigning to you, tentatively, because I don't know what to do with it exactly otherwise.
#17 Updated by intrigeri over 3 years ago
- Assignee changed from intrigeri to anonym
- Target version changed from Tails_2.6 to Tails_3.0
Thank you for putting it back onto our radar! Indeed it was a mistake to assign this to 2.6 with no assignee. I think that anonym has some improvements in this area on the feature/stretch branch, so moving it to 3.0.