Project

General

Profile

Bug #11464

Bug #10288: Fix newly identified issues to make our test suite more robust and faster

"all notifications have disappeared" step is fragile when network is unplugged

Added by intrigeri over 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Elevated
Assignee:
-
Category:
Test suite
Target version:
Start date:
05/23/2016
Due date:
% Done:

100%

Feature Branch:
test/11398-florence-hides-other-windows
Type of work:
Code
Blueprint:
Starter:
Affected tool:

Description

It relies on the fact that the notification applet is there. When we boot with network plugged, the (persistent) Tor is ready has to be here so we're fine. However, when booting with network unplugged, we can't rely on that, so instead we rely on subtle timing between the previous step ("the Tails desktop is ready", where we do more and more things) and the timeout of the "started in VM" notification. The changes brought to fix #11398 trigger this latent bug, that could hit us any time we add more stuff to "the Tails desktop is ready".

00_28_21_Booting_Tails_does_not_automount_untrusted_fat32_partitions.mkv (520 KB) intrigeri, 05/23/2016 03:40 PM

00_28_15_Booting_Tails_does_not_automount_untrusted_fat32_partitions.mkv (487 KB) intrigeri, 05/24/2016 01:43 PM

devel_419.log View (705 KB) bertagaz, 07/22/2016 03:00 AM

00_02_43_Anti-test__no_memory_erasure_on_a_modern_computer.mkv (2.28 MB) bertagaz, 07/22/2016 03:01 AM


Related issues

Related to Tails - Bug #11398: Florence sometimes hides other windows, which breaks tests Resolved 05/05/2016

Associated revisions

Revision 492df175 (diff)
Added by intrigeri over 3 years ago

Test suite: fix TOCTOU race condition in "all notifications have disappeared" step.

The icon we're looking for may disappear between the time when we check if it's
there, and the time when we're clicking on it.

refs: #11464

Revision 06634d46
Added by anonym over 3 years ago

Merge remote-tracking branches 'origin/test/10376-fix_startup-page_roadmap-test-is-fragile' and 'origin/test/11401-notification-wait-is-fragile' into testing

Fix-committed: #10376, #10497, #11398, #11401, #11464

Revision 358eed2c (diff)
Added by anonym almost 3 years ago

Test suite: fix the 'all notifications have disappeared' step.

Fix-committed: #11464

Revision 9552d635 (diff)
Added by anonym almost 3 years ago

Test suite: robustness improvements when dealing with notifications.

Refs: #11464

History

#1 Updated by intrigeri over 3 years ago

  • Related to Bug #11398: Florence sometimes hides other windows, which breaks tests added

#2 Updated by intrigeri over 3 years ago

There's one more cause for unreliability here: there's a TOCTU race condition in:

  next if not(@screen.exists("GnomeNotificationApplet.png"))
  @screen.click("GnomeNotificationApplet.png")
  @screen.wait("GnomeNotificationAppletOpened.png", 10)

... that triggers:

[log] CLICK on (1007,762)
    And I start Tails from DVD with network unplugged and I login                # features/step_definitions/common_steps.rb:193
      FindFailed: can not find GnomeNotificationAppletOpened.png on the screen.
      Line ?, in File ? (RuntimeError)
      ./features/step_definitions/common_steps.rb:210:in `/^I start Tails( from DVD)?( with network unplugged)?( and I login)?$/'
      features/untrusted_partitions.feature:78:in `And I start Tails from DVD with network unplugged and I login'

(video attached).

Maybe we should try clicking the image in a block that catches exceptions, and do next iff any exception is raised? Sounds like this would easily eliminate the race condition, no?

Here again, I can't really mark scenarios as fragile (yet) since so many are affected, so bumping priority a bit.

#3 Updated by intrigeri over 3 years ago

  • Status changed from Confirmed to In Progress
  • Assignee set to intrigeri
  • Target version set to Tails_2.4
  • % Done changed from 0 to 10
  • Feature Branch set to test/11398-florence-hides-other-windows

Pushed a tentative fix to test/11398-florence-hides-other-windows, since I think that branch makes it more likely to hit such problems.

#4 Updated by intrigeri over 3 years ago

  • File 00_11_51_Tails_will_not_enable_disk_swap.mkv added

My tentative fix has a gross bug, pushed a fixup.

#5 Updated by intrigeri over 3 years ago

  • File deleted (00_11_51_Tails_will_not_enable_disk_swap.mkv)

#6 Updated by intrigeri over 3 years ago

Even with my fix in, this still sometimes fails: https://jenkins.tails.boum.org/job/test_Tails_ISO_test-10376-fix-startup-page-roadmap-test-is-fragile/28/. I think this shows that Sikuli's screen.click is racy, and not reliable when used on a fast-changing UI. I'll try to thing of a workaround, but really IMO we should try to find another solution (would dogtail be less racy?) to make the whole notifications thing more robust.

#7 Updated by intrigeri over 3 years ago

I think that another problem I've just fixed (eab5505) might have made this problem more likely to happen, because previously we were often running "all notifications have disappeared" twice, which at least doubles the chances it fails (I say "at least" because the 2nd call has more chances to be close to the time when the "running in a VM" notification times out, due to how the whole timing of our session initialization vs. test suite works).

#8 Updated by intrigeri over 3 years ago

  • Assignee changed from intrigeri to anonym
  • % Done changed from 10 to 50
  • QA Check set to Ready for QA

Seems to be pretty robust now!

#9 Updated by anonym over 3 years ago

  • Status changed from In Progress to 11
  • Assignee changed from anonym to intrigeri
  • % Done changed from 50 to 100

intrigeri wrote:

Seems to be pretty robust now!

Indeed, which is the reason I merged it, but you might want to look at this commit I pushed on top:

761f3df You cannot `return` in a step definition block.

Is there anything we can do to cleanly deal with this (unlikely?) error case? IMHO we can just forget about this and revisit if we see issues -- AFAICT the real fix of this branch is the other stuff.

#10 Updated by anonym over 3 years ago

anonym wrote:

Indeed, which is the reason I merged it, but you might want to look at this commit I pushed on top:

And:

cc1051e Make try_for less race-prone.

#11 Updated by intrigeri over 3 years ago

Indeed, which is the reason I merged it, but you might want to look at this commit I pushed on top:

> 761f3df You cannot `return` in a step definition block.
> 

Indeed, right. IIRC I was not convinced either, but copied it from some other place that I can't find now, so probably I'm just wrong :)

Is there anything we can do to cleanly deal with this (unlikely?) error case?

Indeed we still have a TOCTOU problem here, but apparently it's less bad than the previous exists+click. I don't know. We could assume that a failure screen.wait("GnomeNotificationAppletOpened.png", 10) means that the notification applet has disappeared, and then we need to unclick what we clicked, but it feels potentially fragile & premature optimization, so:

IMHO we can just forget about this and revisit if we see issues -- AFAICT the real fix of this branch is the other stuff.

Yes.

#12 Updated by intrigeri over 3 years ago

  • Assignee deleted (intrigeri)
  • QA Check changed from Ready for QA to Pass

#13 Updated by intrigeri over 3 years ago

And:

> cc1051e Make try_for less race-prone.
> 

Well, I had lowered the timeout to 2s in 7c4bd26 because a 5s timeout was triggering #11464. We'll see how it fares in practice, but let's be ready to lower it back.

#14 Updated by anonym over 3 years ago

  • Status changed from 11 to Resolved

#15 Updated by bertagaz over 3 years ago

Sadly we've seen this bug several time again while referencing the test suite failure in Jenkins for 2016 June (see #11087#note-9). Clearly Sikuli clicks too late on the notification area, which has already disappeared, and then ends up opening the workspace selection applet.

#16 Updated by u over 3 years ago

  • Assignee set to intrigeri

While checking "Volunteers to handle important tickets flagged for next release, but without assignee" during the current contributor meeting we saw this ticket. Assigning to you, tentatively, because I don't know what to do with it exactly otherwise.

#17 Updated by intrigeri over 3 years ago

  • Assignee changed from intrigeri to anonym
  • Target version changed from Tails_2.6 to Tails_3.0

Thank you for putting it back onto our radar! Indeed it was a mistake to assign this to 2.6 with no assignee. I think that anonym has some improvements in this area on the feature/stretch branch, so moving it to 3.0.

#18 Updated by anonym almost 3 years ago

  • Status changed from Confirmed to 11
  • % Done changed from 0 to 100

#19 Updated by anonym almost 3 years ago

  • Assignee deleted (anonym)
  • QA Check changed from Dev Needed to Pass

#20 Updated by anonym almost 3 years ago

  • Status changed from 11 to Resolved

Also available in: Atom PDF