Project

General

Profile

Bug #10288

Fix newly identified issues to make our test suite more robust and faster

Added by anonym over 3 years ago. Updated 5 months ago.

Status:
In Progress
Priority:
Elevated
Assignee:
Category:
Test suite
Target version:
-
Start date:
02/26/2015
Due date:
% Done:

51%

QA Check:
Feature Branch:
Type of work:
Code
Blueprint:
Starter:
Affected tool:

Description

Our initial plan will be to mark any scenario that we ever see fail in the automated test suite run by jenkins (not locally, or anywhere else) with the @fragile tag. On jenkins we will run the test suite with the Cucumber option --tag ~@fragile, which makes it skip these scenarios.

Whenever we find a robustness issue for a Scenario $SCENARIO we do the following:

  1. Add the @fragile tag to $SCENARIO, commit it to a suitable base branch B. Often B will be stable, but it could be devel if not affecting stable, or e.g. feature/jessie (or other long-term integration branches) if only affecting that one. Let's say this became commit DEADBEE. Then merge B into all base branches where it makes sense (e.g. if B == stable we'd merge into devel, and possible then merge devel into feature/jessie).
  2. File a ticket (let's say the number becomes #NNNNN) with subject: "$SCENARIO is fragile" (or similar) and
    1. reference commit DEADBEE in the description.
    2. make it block this ticket, i.e. #10288.
  3. Create a branch test/NNNN-fix-${SCENARIO} (or a similar, shortened name) from B, and
    1. commit a revert of DEADBEE.
    2. set this branch as the Feature Branch in ticket #NNNN.

Iterating this process (and merging the base branches into all feature/bugfix/test branches, including those created by this procedure) should make us converge to a state where we have isolated all robustness issues to individual branches, and all base branches should be green. On jenkins.

Note: There may be more than one reason to tag a scenario @fragile, which breaks the above scheme a bit. We do not want to end up with the revert of one branch removing the @fragile tag in the base branches when its merged, while there still are at least one unmerged branch with another reason for the scenario to be marked @fragile. I think the best way to track this is on the ticket, and by making a comment around the @fragile tag, listing each ticket tracking the scenarios fragility. In each branch you remove only its ticket => we get a merge conflict as a "notification" when merging, and we only remove the tag in the base branch when the last ticket is removed from the comment.

Creating a summary of failures

1. Clone the puppet-tails Git repo, get the attached json-analysis script.

2. Get the `jobResults-*.xml` files you want from jenkins.lizard:/var/lib/jenkins/global-build-stats/jobresults/

3. download all the JSON test result files you're interested in (you can pass an epoch to ISO-test-suite-runs, that'll be the starting point) e.g.:

cd $PUPPET_TAILS_REPO
for url in $(./files/jenkins/master/ISO-test-suite-runs /tmp/jobResults-2015-12.xml) ; do
    dest=$(mktemp --tmpdir=. tailstester-XXXXXXXXXX.json)
    wget -O "$dest" "$url" 
    [ -s "$dest" ] || rm -f "$dest" 
done

4. Do the analysis on JSON files:

json-analysis --steps *.json@

json-analysis (3.59 KB) anonym, 10/04/2015 12:39 PM

full-summary.txt View (39.6 KB) anonym, 10/12/2015 03:53 AM

summary-20151201-to-20151207.txt View (431 KB) intrigeri, 12/07/2015 07:44 AM


Subtasks

Bug #10375: Increase the number of Tor circuit retries in the test suiteResolved

Bug #10376: The "the Tor Browser loads the (startup page|Tails roadmap)" step is fragileResolved

Feature #10379: Check that we do not see any error pages in the "I open the address" step.Rejected

Bug #10380: gobby tests are fragileConfirmed

Bug #10378: The "Tails OpenPGP keys" scenario is fragileResolved

Bug #10381: The "I open the address" steps are fragileResolved

Bug #8961: The automated test suite doesn't fetch Tor relays from unverified-microdesc-consensus.bakResolved

Bug #9654: "IPv4 TCP non-Tor Internet hosts were contacted" during the test suiteResolved

Bug #10440: Time syncing scenarios are fragileResolved

Bug #10441: Synaptic test is fragileResolved

Bug #10442: Totem "Watching a WebM video over HTTPS" test is fragileIn ProgressCyrilBrulebois

Bug #10444: Git tests are fragileResolved

Bug #10474: Scenario "Connecting to the #i2p IRC channel" is fragileRejected

Bug #10475: Scenario "Using a persistent Electrum configuration" is fragileIn Progress

Bug #10493: The "I see "WindowsSysTraySound.png"" step is fragileResolved

Bug #10495: The 'the time has synced' step is fragileIn Progress

Bug #10496: apt-get scenarios are fragileResolved

Bug #10497: wait_until_tor_is_working helper is fragileResolved

Bug #10498: SSH tests are fragileResolved

Bug #10499: The ICMP Tor enforcement test is fragileConfirmed

Bug #10500: Monitor failure modes of SeahorseRejected

Bug #10501: Step 'the "10CC5BC7" key is in the live user's public keyring' is fragileResolved

Bug #10502: The test suite sometimes cannot connect to the remote shellResolved

Feature #10503: Run erase_memory.feature first to optimize test suite performanceResolved

Bug #10504: boot_device method in features/step_definitions/usb.rb is brokenResolved

Bug #10523: whois test is fragileResolved

Bug #10718: Lower waiting time for USB installation in the test suiteResolved

Bug #10774: MAC address spoofing failure notifications are not always displayedIn ProgressCyrilBrulebois

Bug #10775: "I can view and print a PDF file stored in /usr/share" scenario is fragileResolved

Bug #10776: Step "I shutdown and wait for Tails to finish wiping the memory" fails when memory wiping causes a freezeResolved

Bug #10777: The test suite machinery sometimes misses the boot splashResolved

Bug #10783: Test that clicks the roadmap URL in Pidgin is fragileResolved

Feature #10900: "I should be able to install a package using Synaptic" step is fragileResolved

Bug #10991: The "I both encrypt and sign the message using my OpenPGP key" step is fragileConfirmed

Bug #10992: Fragile test: the OpenPGP applet key selection window is moved partly off-screen instead of selecting a keyConfirmedanonym

Bug #10994: "I can view and print a PDF file" scenarios are fragileConfirmed

Bug #11114: I2P tests are fragileConfirmed

Bug #11394: "Symmetric encryption and decryption using OpenPGP Applet" is fragileConfirmed

Bug #11398: Florence sometimes hides other windows, which breaks testsResolved

Bug #11400: "I test Torbirdy's proxy settings" test sometimes fails due to missing favicon in "congratulations" tabIn Progressanonym

Bug #11401: robust_notification_wait sometimes opens the Applications menu which breaks testsResolved

Bug #11409: Deal with the 'Dogtail: warning: application may be hanging' bugConfirmedanonym

Bug #11413: Test suite: newly added XMPP account is not persisted, and "Pidgin has the expected persistent accounts configured" doesn't noticeResolved

Bug #11414: The "Chatting with some friend over XMPP in a multi-user chat" scenario is fragileConfirmed

Bug #11452: "I2P displays a notice when bootstrapping fails" test is fragileConfirmed

Bug #11453: "Chatting with some friend over XMPP" test is fragileConfirmed

Bug #11457: "I close the Unsafe Browser" step is fragileIn Progressspriver

Bug #11458: "I see the Unsafe Browser start notification and wait for it to close" step is fragileRejected

Bug #11462: "I2P is running" test is fragile: may fail when the time has not sync'ed yetConfirmed

Bug #11463: robust_notification_wait sometimes does not recognize the notification it's looking forConfirmed

Bug #11464: "all notifications have disappeared" step is fragile when network is unpluggedResolved

Bug #11465: focus_window uses select_virtual_desktop in a racy wayConfirmed

Bug #11479: "the Tails desktop is ready" step is fragile due to buggy display of Florence systray iconResolved

Bug #11508: Firewall leak detector makes bad assumptions about PacketFu parsingResolved

Bug #11521: The check_tor_leaks hook is fragileConfirmedanonym

Bug #11558: Step a Tails persistence partition exists on USB drive is fragileResolved

Bug #11563: Git over HTTPS scenario is fragileConfirmedbertagaz

Bug #11582: Some upgrade test scenarios fail due to lack of disk space on JenkinsResolved

Bug #11583: UEFI boot tests fail on JenkinsResolved

Bug #11584: "Using a persistent Pidgin configuration" is fragileConfirmed

Bug #11585: "Persistent browser bookmarks" is fragileConfirmed

Bug #11588: Sometimes fails to boot from USB on Jenkins with I/O errorsResolved

Bug #11589: Time syncing over bridge is fragileConfirmed

Bug #11591: Step "the Tor Browser shows the [...] error" is fragileConfirmed

Bug #11592: Step "[...] has loaded in the Tor Browser" is fragileConfirmed

Bug #11606: "Tor Launcher uses all expected TBB shared libraries" is fragileConfirmed

Bug #11616: "The emergency shutdown applet can .." scenarios are fragileResolved

Bug #11617: Clicking "Yes" for "More options?" in the Greeter sometimes fails in JenkinsResolved

Bug #11697: Step "Electrum successfully connects to the network" is fragileIn Progress

Bug #11698: Test suite calls undefined save_pcap_file method in " the network device has its default MAC address configured"Resolved

Bug #11711: "The Unsafe Browser can be used in all languages supported in Tails" test is broken for locales that have a translated homepageResolved

Bug #11816: Test suite often freezes after clicking "Login" in the GreeterResolved

Bug #11865: In the test suite we sometimes boot from the isohybrid when we intended to boot from the DVDConfirmed

Bug #11890: Checking credentials in Icedove autoconfig wizard sometimes fails in the test suiteIn Progressanonym

Bug #11892: Sometimes the remote shell doesn't start because of missing initial Space when modifying the kernel cmdlineIn Progressanonym

Bug #11901: Adjust test suite to take into account that MAT does not clean PDF files anymoreResolved

Bug #11906: Icedove "Only the expected addons are installed" scenario fails since "amnesia branding" is not installedResolved

Bug #12040: Test suite cannot sometimes connect to the remote shell: "Dropped out-of-order remote shell response: got id but expected id NNNN"Confirmedanonym

Bug #12041: Spurious reboot breaks test suite which cannot connect to the remote shellConfirmedanonym

Bug #12042: Icedove email sending test sometimes fails due to the Attachment ReminderConfirmedanonym

Bug #12043: Test failure in "Fetching OpenPGP keys using Seahorse via the OpenPGP Applet should work and be done over Tor" due to weird interaction with GNOME Shell tiling featuresConfirmed

Bug #12044: Step "only the expected files are present on the persistence partition" sometimes fails: guestfs fails to find partitionConfirmed

Bug #12045: Step 'I try a "Clone & Upgrade" Tails to USB drive "isohybrid"' sometimes fails: no target drive listedConfirmed

Bug #12047: Step 'I temporarily create a 100 MiB disk named "swap"' timeoutsConfirmed

Bug #12131: Step 'I double-click the Report an Error launcher on the desktop' sometimes failsResolved

Bug #12132: Step 'I shutdown Tails and wait for the computer to power off' sometimes fails by rebooting insteadConfirmed

Bug #12558: The "Chatting with some friend over XMPP in a multi-user chat" scenario is broken on Riseup MUCsConfirmed

Bug #12586: Synaptic test is fragile on StretchRejected

Bug #13458: Step "a screenshot is saved to the live user's Pictures directory" is fragileConfirmed

Bug #13459: Scenario "Booting Tails from a USB drive in UEFI mode" is fragileConfirmed

Bug #13460: Virt-viewer fails to startConfirmedanonym

Bug #13461: The Desktop icons are sometimes not displayed since the upgrade to StretchResolved

Bug #13469: Starting applications "via GNOME Activities Overview" step is fragileConfirmed

Bug #13470: Step "Tails Greeter has applied all settings" is fragileConfirmed

Bug #13541: Tor still sometimes fails to bootstrap in the test suiteIn Progress

Bug #14770: "Fetching OpenPGP keys" scenarios are fragileIn Progressanonym

Bug #14771: Retrying mechanism for the "I open the address" step is buggy in the Unsafe BrowserResolved

Bug #15321: "The Report an Error launcher will…" test suite step is fragileConfirmed

Bug #15514: The "The Tails documentation launcher on the desktop works…" scenarios are fragileConfirmed


Related issues

Related to Tails - Feature #10287: Set up limited email notification on automatic test failure for the initial deployment Resolved 09/27/2015
Related to Tails - Bug #10096: Fix newly identified issues to make our test suite more robust and faster, phase 2 Rejected 08/26/2015
Related to Tails - Feature #11355: Re-enable Jenkins notifications on ISO build/test failure In Progress 08/28/2017

History

#1 Updated by anonym over 3 years ago

  • Description updated (diff)

#4 Updated by anonym over 3 years ago

  • Related to Feature #10287: Set up limited email notification on automatic test failure for the initial deployment added

#5 Updated by anonym over 3 years ago

  • File analysis-summary.txt added
  • File tailstester1-json.tar.bz2 added
  • File json-analysis added
  • Assignee changed from anonym to kytv
  • QA Check set to Dev Needed

Edit: Removing comment. So much was wrong with these early isotester1 runs that it's just confusing the discussion on this ticket.

#6 Updated by anonym over 3 years ago

  • File deleted (analysis-summary.txt)

#7 Updated by anonym over 3 years ago

  • File deleted (tailstester1-json.tar.bz2)

#8 Updated by anonym over 3 years ago

Here's a summary for the last 12 runs on isotester1 (simply json-analysis *.json):

Step failure breakdown (total: 55):
* 24    Step: the Unsafe Browser works in all supported languages
  - 24    Scenario: The Unsafe Browser can be used in all languages supported in Tails
* 10    Step: Tor is ready
  - 5     Scenario: Clock way in the future
  - 2     Scenario: Using obfs2 pluggable transports
  - 1     Scenario: Clock with host's time
  - 1     Scenario: The tor process should be confined with Seccomp
  - 1     Scenario: I2P is enabled when the "i2p" boot parameter is added
* 6     Step: Pidgin successfully connects to the "irc.oftc.net" account
  - 4     Scenario: Connecting to the #tails IRC channel with the pre-configured account
  - 2     Scenario: Using a persistent Pidgin configuration
* 4     Step: the OpenPGP keys shipped with Tails will be valid for the next 3 months
  - 4     Scenario: The shipped Tails OpenPGP keys are up-to-date
* 2     Step: the Tor Browser has started and loaded the Tails roadmap
  - 2     Scenario: Connecting to the #tails IRC channel with the pre-configured account
* 1     Step: I see "SSHAuthVerification.png" after at most 60 seconds
  - 1     Scenario: SSH is using the default SocksPort
* 1     Step: I configure some Bridge pluggable transports in Tor Launcher
  - 1     Scenario: Clock way in the future in bridge mode
* 1     Step: I create a new bitcoin wallet
  - 1     Scenario: Using a persistent Electrum configuration
* 1     Step: I open "/home/amnesia/Persistent/default-testpage.pdf" with Evince
  - 1     Scenario: I can view and print a PDF file stored in persistent /home/amnesia/Persistent but not /home/amnesia/.gnupg
* 1     Step: I click the blocked video icon
  - 1     Scenario: Watching a WebM video
* 1     Step: I fetch the "10CC5BC7" OpenPGP key using the GnuPG CLI without any signatures
  - 1     Scenario: Syncing OpenPGP keys using Seahorse started from the Tails OpenPGP Applet should work and be done over Tor.
* 1     Step: I connect Gobby to "gobby.debian.org" 
  - 1     Scenario: Gobby is using the default SocksPort
* 1     Step: the Tor Browser has started and loaded the startup page
  - 1     Scenario: Importing an OpenPGP key from a website
* 1     Step: I see "WindowsStartMenu.png" after at most 10 seconds
  - 1     Scenario: The panel menu should look like Microsoft Windows's start menu

Some more detailed analysis:

* 24    Step: the Unsafe Browser works in all supported languages
  - 24    Scenario: The Unsafe Browser can be used in all languages supported in Tails

This is because isotester1 doesn't set an UTF-8 locale. See #10359.

Also, we apparently have duplicated the 'The Unsafe Browser can be used in all languages supported in Tails' scenario in both localization.feature and unsafe_browser.feature.

* 10    Step: Tor is ready
  - 5     Scenario: Clock way in the future
  - 2     Scenario: Using obfs2 pluggable transports
  - 1     Scenario: Clock with host's time
  - 1     Scenario: The tor process should be confined with Seccomp
  - 1     Scenario: I2P is enabled when the "i2p" boot parameter is added

7 of these happen in wait_until_tor_is_working indicating that #9516 didn't solve everything. In fact, all those errors occur in the 'Tor is ready' step, implying that none seem to occur when we restore from a snapshot. My impression is that before #9516 the Tor bootstrapping issues happened just as often when restoring from snapshot. Add to that that the case where we run wait_until_tor_is_working after restoring a snapshot happens a lot more frequently than in the 'Tor is ready' step. This seems to indicate that #9516 has issues with the initial bootstrap, e.g. when combined with tordate.

The other failures happened in the (sub) step 'the time has synced', and specifically it was htpdate. We should add some improved error logging (e.g. dump contents of /var/log/htpdate.log) and possibly logic for retrying htpdate on failure. Although that should be thought about carefully since these errors affect users as much and hence are real issues.

Any way, we cannot mark all scenarios using this step (incl. those depending on snapshots using it) @fragile since it would disable all tests using the network, essentially.

* 6     Step: Pidgin successfully connects to the "irc.oftc.net" account
  - 4     Scenario: Connecting to the #tails IRC channel with the pre-configured account
  - 2     Scenario: Using a persistent Pidgin configuration

All are of the type: "The operation failed (despite forcing 5 new Tor circuits) with: RuntimeError: Connecting to account irc.oftc.net failed. (TorFailure)".

OFTC blocking is still a problem. We could try bumping the retries from 5 to 10. Thoughts?

* 4     Step: the OpenPGP keys shipped with Tails will be valid for the next 3 months
  - 4     Scenario: The shipped Tails OpenPGP keys are up-to-date

Expected. We need to be more proactive about updating the key... :)

* 2     Step: the Tor Browser has started and loaded the Tails roadmap
  - 2     Scenario: Connecting to the #tails IRC channel with the pre-configured account

The 'I see the Tails roadmap URL' doesn't use the retrying-magic we have in the 'I open the address ...' step. We should refactor out that code from the latter so it can be used in the former step.

* 1     Step: I see "SSHAuthVerification.png" after at most 60 seconds
  - 1     Scenario: SSH is using the default SocksPort

I'm unsure what's wrong here. For some reason isotester1 doesn't have any artifacts except the json log for run 41, so I cannot investigate further.

* 1     Step: I configure some Bridge pluggable transports in Tor Launcher
  - 1     Scenario: Clock way in the future in bridge mode

Known to be very fragile. We should probably mark all the "way in the past/future" scenarios as @fragile right away.

* 1     Step: I create a new bitcoin wallet
  - 1     Scenario: Using a persistent Electrum configuration

This was on run 45, and the error screenshot shows an image of a prompt saying "You are offline". Interesting.

* 1     Step: I open "/home/amnesia/Persistent/default-testpage.pdf" with Evince
  - 1     Scenario: I can view and print a PDF file stored in persistent /home/amnesia/Persistent but not /home/amnesia/.gnupg

Here we failed to find GnomeTerminalWindow.png. Can't investigate further due to missing artifacts in run 38.

* 1     Step: I click the blocked video icon
  - 1     Scenario: Watching a WebM video

Run 35. Looking at the trace is interesting:

    And I open the address "https://webm.html5.org/test.webm" in the Tor Browser      # features/step_definitions/common_steps.rb:550
    And I click the blocked video icon                                                # features/step_definitions/common_steps.rb:838
      FindFailed: can not find TorBrowserBlockedVideo.png on the screen.

In the error screenshot I can see that the Tor Browser is showing the "The connection has timed out" page. Something must be wrong with our recent improvements in the 'I open the address ...' step since it accepted this page. Possibly the existing condition isn't enough, but we should also check that we do not see this particular error page, or perhaps any error page. Thoughts?

* 1     Step: I fetch the "10CC5BC7" OpenPGP key using the GnuPG CLI without any signatures
  - 1     Scenario: Syncing OpenPGP keys using Seahorse started from the Tails OpenPGP Applet should work and be done over Tor.

Another "The operation failed (despite forcing 5 new Tor circuits)". Either we have to bump the retries, or start investigating the OpenPGP server issue again.

* 1     Step: I connect Gobby to "gobby.debian.org" 
  - 1     Scenario: Gobby is using the default SocksPort

Run 43. The error screenshot shows that Gobby is still trying to resolve gobby.debian.net. I guess we need Tor retry magic here.

* 1     Step: the Tor Browser has started and loaded the startup page
  - 1     Scenario: Importing an OpenPGP key from a website

Happened in run 39, which lacks artifacts so I cannot investigate. Our recent retry magic should have fixed it, but maybe it's another instance of the "The connection has timed out" page messing things up, like above?

* 1     Step: I see "WindowsStartMenu.png" after at most 10 seconds
  - 1     Scenario: The panel menu should look like Microsoft Windows's start menu

Happened in run 39, which lacks artifacts so I cannot investigate.

#9 Updated by anonym over 3 years ago

  • Status changed from Confirmed to In Progress
  • Target version changed from Tails_1.8 to Tails_1.7
  • QA Check changed from Dev Needed to Info Needed

Actually, we need to deal with the @fragile tagging and branch creation ASAP since the jenkins deplyment is imminent.

kytv, can you please go through my analysis above? Could you create tickets where you agree that there is a problem? I think we should skip those where we didn't get artifacts (and hence have no clue what the issue is) so they will be re-run and fail again soon with artifacts. I can then do the @fragile tagging + branch creation since that's easier with commit rights.

#10 Updated by kytv over 3 years ago

anonym wrote:

Here's a summary for the last 12 runs on isotester1 (simply json-analysis *.json):
[...]

Some more detailed analysis:

[...]

All are of the type: "The operation failed (despite forcing 5 new Tor circuits) with: RuntimeError: Connecting to account irc.oftc.net failed. (TorFailure)".

OFTC blocking is still a problem. We could try bumping the retries from 5 to 10. Thoughts?

I'm fine with that. We could even go higher since it's a configurable option.

The 'I see the Tails roadmap URL' doesn't use the retrying-magic we have in the 'I open the address ...' step. We should refactor out that code from the latter so it can be used in the former step.

+1 for refactoring

This was on run 45, and the error screenshot shows an image of a prompt saying "You are offline". Interesting.

Hmm. Perhaps something to do with our Electrum being "too old" to connect to the servers? Granted, [ "too old" != "offline" ], but maybe the error is wrong.

[...]

In the error screenshot I can see that the Tor Browser is showing the "The connection has timed out" page. Something must be wrong with our recent improvements in the 'I open the address ...' step since it accepted this page. Possibly the existing condition isn't enough, but we should also check that we do not see this particular error page, or perhaps any error page. Thoughts?

+1 for checking for an error.

Run 43. The error screenshot shows that Gobby is still trying to resolve gobby.debian.net. I guess we need Tor retry magic here.

Agreed.

#11 Updated by anonym over 3 years ago

anonym wrote:

The other failures happened in the (sub) step 'the time has synced', and specifically it was htpdate. We should add some improved error logging (e.g. dump contents of /var/log/htpdate.log) and possibly logic for retrying htpdate on failure. Although that should be thought about carefully since these errors affect users as much and hence are real issues.

I'm seeing this more and more, possibly because I'm seeing less and less of Tor bootstrapping errors. I think we need some retry_tor love for htpdate only in that step. Note: tordate is part of the Tor bootstrapping, which we already have fixed, more or less. So we need a ticket for this, but we cannot do anything about this with the @fragile tag.

Next: another scenario that needs a @fragile tag is 'Install packages using Synaptic'. I've been running the test suite on another, weaker machine that I have, and I've seen two instances of:

    And I update APT using Synaptic                           # features/step_def
initions/apt.rb:31
[log] Ctrl+TYPE "f" 
    Then I should be able to install a package using Synaptic # features/step_def
initions/apt.rb:45
      FindFailed: can not find SynapticSearch.png on the screen.

So, again, we're having issues with keyboard shortcuts. While it could be a window focus issue, or solvable with retrying, I think the proper solution is to make this scenario completely image + mouse click driven (there are more than just Ctrl+f).

#12 Updated by anonym over 3 years ago

kytv wrote:

anonym wrote:

Here's a summary for the last 12 runs on isotester1 (simply json-analysis *.json):
[...]

Some more detailed analysis:

[...]

All are of the type: "The operation failed (despite forcing 5 new Tor circuits) with: RuntimeError: Connecting to account irc.oftc.net failed. (TorFailure)".

OFTC blocking is still a problem. We could try bumping the retries from 5 to 10. Thoughts?

I'm fine with that. We could even go higher since it's a configurable option.

IMHO it being configurable only makes this easier to test (even without a branch). However, the defaults should be sane, so that's what we're aiming to increase here if it will improve the situation.

This was on run 45, and the error screenshot shows an image of a prompt saying "You are offline". Interesting.

Hmm. Perhaps something to do with our Electrum being "too old" to connect to the servers? Granted, [ "too old" != "offline" ], but maybe the error is wrong.

Could be. I've never seen that before, so perhaps it's only worth creating the ticket, but skip the @fragile and branch dance.

#13 Updated by kytv over 3 years ago

anonym wrote:

kytv wrote:

anonym wrote:

Here's a summary for the last 12 runs on isotester1 (simply json-analysis *.json):
[...]

Some more detailed analysis:

[...]

All are of the type: "The operation failed (despite forcing 5 new Tor circuits) with: RuntimeError: Connecting to account irc.oftc.net failed. (TorFailure)".

OFTC blocking is still a problem. We could try bumping the retries from 5 to 10. Thoughts?

I'm fine with that. We could even go higher since it's a configurable option.

IMHO it being configurable only makes this easier to test (even without a branch). However, the defaults should be sane, so that's what we're aiming to increase here if it will improve the situation.

Of course. :) Since I didn't know what would be deemed "sane" I went with 5 when I implemented it.

Before #9653 we were doomed to wait a minute for the #tails channel image to appear while a Reconnect button was waiting to be clicked. Retrying 15 times would mean waiting at least 15 minutes but now it could go through all 15 retries in half a minute. Raising it to 10 or 15 seems reasonable.

#14 Updated by kytv over 3 years ago

  • Blocked by Bug #10378: The "Tails OpenPGP keys" scenario is fragile added

#15 Updated by kytv over 3 years ago

  • Blocked by Bug #10380: gobby tests are fragile added

#16 Updated by kytv over 3 years ago

  • Blocked by Bug #10381: The "I open the address" steps are fragile added

#17 Updated by anonym over 3 years ago

General remark/clarification: in some instances it's not really our tests that are fragile, but rather our features in Tails that actually are fragile with the tests being correct. Hence I think that @fragile actually means that the test is fragile and/or that the Tails feature being tested is fragile.

Note that there is some relationship with disabling known broken tests, i.e. #7233. However, I think we must distinguish tests that are fragile, i.e. fail with some probability, from tests that always fail. Only the latter is suitable to be marked @known_broken á #7233, because when we have the proper solution, the tests will not be disabled, but be run and they will be expected to fail (i.e. success/failure are reversed)! IMHO this also supports my reasoning for the extended usage of @fragile, above.

anonym wrote:

OFTC blocking is still a problem. We could try bumping the retries from 5 to 10. Thoughts?

Now we have #10375 which might improve the situation, but clearly OFTC is to blame here, no the test suite. #7874 seems like the actual fix, but we'll see how the retries bump works out. That should be tried in the branch we create for removing the @fragile tag.

[...] Scenario: Clock way in the future in bridge mode

Known to be very fragile. We should probably mark all the "way in the past/future" scenarios as @fragile right away.

Again, the problem is not the tests, but that our time syncing is not robust (especially the "tordate" component), so the real fix is #5774, not anything related to the tests. Marking all these four scenarios as @fragile is completely necessary, though.

#18 Updated by intrigeri over 3 years ago

  • Subject changed from Fix newly identified robustness issues in the automated test suite to Fix newly identified issues to make our test suite more robust and faster, phase 1

anonym, kytv: we actually have a deliverable on this for milestone III, and another for milestone VI. The second one already had a ticket (#10096) so I'm tweaking this one to track the milestone III goals. In practice, of course you'll be continuously fixing stuff as part of intermediary milestones as well. That's a big pile of work (whose size is not known yet) to be done over the courses of 9.5 months, so it'll need to be split into smaller batches, so that it's clear what are the ones you want to fix during the current cycle, etc. I suggest you two draft that in two weeks, when we know more about how large the problem is, but still in time for the next CI team meeting.

#19 Updated by intrigeri over 3 years ago

  • Related to Bug #10096: Fix newly identified issues to make our test suite more robust and faster, phase 2 added

#20 Updated by intrigeri over 3 years ago

  • Blocked by deleted (Bug #10380: gobby tests are fragile)

#21 Updated by intrigeri over 3 years ago

  • Blocked by deleted (Bug #10378: The "Tails OpenPGP keys" scenario is fragile)

#22 Updated by intrigeri over 3 years ago

  • Blocked by deleted (Bug #10381: The "I open the address" steps are fragile)

#23 Updated by anonym over 3 years ago

  • Description updated (diff)
  • Status changed from In Progress to Confirmed

#24 Updated by anonym over 3 years ago

  • Description updated (diff)

#25 Updated by intrigeri over 3 years ago

intrigeri wrote:

anonym, kytv: we actually have a deliverable on this for milestone III, and another for milestone VI. The second one already had a ticket (#10096) so I'm tweaking this one to track the milestone III goals. In practice, of course you'll be continuously fixing stuff as part of intermediary milestones as well. That's a big pile of work (whose size is not known yet) to be done over the courses of 9.5 months, so it'll need to be split into smaller batches, so that it's clear what are the ones you want to fix during the current cycle, etc. I suggest you two draft that in two weeks, when we know more about how large the problem is, but still in time for the next CI team meeting.

I think we've reached about the time when this should be done (but if you want to do it after 1.7 release and before the CI meeting it's fine with me). Note that since my last comment the target version / reality inconsistency has expanded to new tickets (#10440, #10441, #10442, #10443 and #10444).

#26 Updated by anonym over 3 years ago

intrigeri wrote:

intrigeri wrote:

anonym, kytv: we actually have a deliverable on this for milestone III, and another for milestone VI. The second one already had a ticket (#10096) so I'm tweaking this one to track the milestone III goals. In practice, of course you'll be continuously fixing stuff as part of intermediary milestones as well. That's a big pile of work (whose size is not known yet) to be done over the courses of 9.5 months, so it'll need to be split into smaller batches, so that it's clear what are the ones you want to fix during the current cycle, etc. I suggest you two draft that in two weeks, when we know more about how large the problem is, but still in time for the next CI team meeting.

I think we've reached about the time when this should be done (but if you want to do it after 1.7 release and before the CI meeting it's fine with me). Note that since my last comment the target version / reality inconsistency has expanded to new tickets (#10440, #10441, #10442, #10443 and #10444).

Whatever you expect from this ticket is not what I intended. It was to me never the idea that we'd be able to fix even the first batch of fragile tests in time for milestone III.

I want this to be the master ticket collecting all fragile tests we currently have. So imho we're still in the first phase (so I don't get what #10096 is for now). (BTW, the target version I set was only to make sure kytv would see this ticket in his view -- sorry for not treating the Redmine state as holy scripture and being pragmatic instead :)). Individual tickets can be assigned specific deliverable milestones, or we create some tracking tickets (and there we can call them phases) that are blocked by some of these tickets. At least I feel like we'll have a better overview of what has to be fixed to make jenkins work good enough for us by having this single ticket tracking all the robustness issues.

#27 Updated by intrigeri over 3 years ago

  • Subject changed from Fix newly identified issues to make our test suite more robust and faster, phase 1 to Fix newly identified issues to make our test suite more robust and faster
  • Target version changed from Tails_1.7 to Tails_2.5
  • QA Check deleted (Info Needed)

#28 Updated by intrigeri over 3 years ago

  • File summary-20151201-to-20151207.txt added

Here's the output of json-analysis for the test suite runs since the beginning of December.

#29 Updated by intrigeri over 3 years ago

  • Description updated (diff)

(Documenting how to create such a summary without too much by-hand painful operations).

#30 Updated by intrigeri over 3 years ago

  • File deleted (summary-20151201-to-20151207.txt)

#31 Updated by intrigeri over 3 years ago

(Updated summary + doc to include details requested by bertagaz.)

#32 Updated by intrigeri about 3 years ago

  • Subject changed from Fix newly identified issues to make our test suite more robust and faster to Fix newly identified issues to make our test suite more robust and faster
  • Assignee changed from kytv to anonym

#33 Updated by sajolida almost 3 years ago

  • Blocks Feature #10394: Identify which of the remaining manual tests have the best cost/benefit to automate added

#34 Updated by BitingBird over 2 years ago

  • Status changed from Confirmed to In Progress

#35 Updated by anonym over 2 years ago

  • Target version changed from Tails_2.5 to Tails_2.7

#38 Updated by intrigeri over 2 years ago

(Add a wip/ prefix to the newly created branches, see tails-ci@.)

#39 Updated by intrigeri over 2 years ago

  • Assignee deleted (anonym)
  • Target version deleted (Tails_2.7)

#40 Updated by intrigeri over 2 years ago

  • Related to Feature #11355: Re-enable Jenkins notifications on ISO build/test failure added

#41 Updated by sajolida over 2 years ago

  • Blocks deleted (Feature #10394: Identify which of the remaining manual tests have the best cost/benefit to automate)

#42 Updated by u over 1 year ago

  • Assignee set to anonym

Assigning the parent ticket to anonym for tracking.

Also available in: Atom PDF