Project

General

Profile

Feature #9518

Feature #9515: Improve test suite robustness vs transient network errors

Retry with new OpenPGP key server pool member when they misbehave

Added by anonym over 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Test suite
Target version:
Start date:
06/02/2015
Due date:
% Done:

100%

Feature Branch:
kytv:test/9518-gpg-retry-with-new-circuit
Type of work:
Code
Blueprint:
Starter:
Affected tool:

Description

That is, when we pick a pool member that doesn't have the key, returns weird SSL errors, or other annoying things.

To force a new member, it may be enough to simply force new Tor circuits, e.g. by using Vidalia's New Identity feature. We may have to retry multiple times in a row.


Related issues

Related to Tails - Bug #9478: How to deal with transient network errors in the test suite? Resolved 05/27/2015
Related to Tails - Bug #9632: restart-vidalia appears to occasionally fail to actually restart Vidalia Rejected 06/20/2015
Blocked by Tails - Feature #9517: Retry connecting to OFTC when it fails Resolved 06/02/2015
Blocks Tails - Feature #9668: Retry whois request on "WHOIS LIMIT EXCEEDED" Resolved 07/03/2015

Associated revisions

Revision a7f6a9b4 (diff)
Added by kytv about 4 years ago

Request a new Tor circuit and re-run the Seahorse tests on failure

Will-fix: #9518

Revision 3be9ea5f (diff)
Added by kytv about 4 years ago

Request a new Tor circuit and re-run the GPG cli tests on failure

Will-fix: #9518

Revision 6c125cbf
Added by intrigeri about 4 years ago

Merge branch 'test/9518-gpg-retry-with-new-circuit' into devel

Fix-committed: #9518

History

#2 Updated by anonym over 4 years ago

  • Related to Bug #9478: How to deal with transient network errors in the test suite? added

#3 Updated by intrigeri over 4 years ago

  • Related to deleted (Bug #9478: How to deal with transient network errors in the test suite?)

#5 Updated by intrigeri over 4 years ago

  • Related to Bug #9478: How to deal with transient network errors in the test suite? added

#7 Updated by kytv over 4 years ago

  • Status changed from Confirmed to In Progress

#8 Updated by kytv over 4 years ago

Do we want to do anything with regards to Seahorse's segfaulting? (please say "NO!")

I have found that setting the keyserver to localhost and initiating key synchronization will lead to the following:

If Seahorse is started from a terminal:
  • An error pops up: "Couldn't publish keys to server" (expected and desired, of course)
  • After clicking close on that error, the Synchronizing Keys progress window remains. Clicking cancel or waiting will lead to a segfault.
If Seahorse is started from the GPG Applet or the Gnome menu:
  • Immediate segfault of Seahorse

Fetching keys in Seahorse does not seem to have this problem with segfaulting.

#9 Updated by kytv over 4 years ago

  Scenario: Fetching OpenPGP keys using Seahorse should work and be done over Tor.         # features/torified_gnupg.feature:29
Forcing new Tor circuit    When I fetch the "10CC5BC7" OpenPGP key using Seahorse                                 # features/step_definitions/torified_gnupg.rb:139
    Then the "10CC5BC7" key is in the live user's public keyring after at most 120 seconds # features/step_definitions/torified_gnupg.rb:80

  Scenario: Fetching OpenPGP keys using Seahorse via the Tails OpenPGP Applet should work and be done over Tor. # features/torified_gnupg.feature:33

The Forcing new Tor circuit text (added with STDERR.print or STDERR.puts, I don't remember ATM and it's not terribly important) will be replaced or removed, but it seems to be working. :)

I found, however, that in previous runs this step failed because vidalia wasn't running, but I don't know why it wasn't running. I have been testing this a lot and it's only happened once so far, but unfortunately I have no information about it.

#10 Updated by kytv over 4 years ago

kytv wrote:

I found, however, that in previous runs this step failed because vidalia wasn't running, but I don't know why it wasn't running. I have been testing this a lot and it's only happened once so far, but unfortunately I have no information about it.

Once more.

calling as root: . /usr/local/lib/tails-shell-library/tor.sh; tor_is_working
call returned: [0, "", ""]
spawning as root: restart-vidalia
spawn returned: [0, "", ""]
calling as amnesia: gpg --batch --keyserver-options import-clean --recv-key '10CC5BC7'
call returned: [2, "gpgkeys: HTTP fetch error 35: gnutls_handshake() failed: A TLS fatal alert has been received.\n", "gpg: requesting key 0x10CC5BC7 from hkps server hkps.pool.sks-keyservers.net\ngpg: no valid OpenPGP data found.\ngpg: Total number processed: 0\n"]
Forcing new Tor circuit
calling as root: pidof -x -o '%PPID' vidalia
call returned: [1, "", ""]

I'll work around it for now but it would be good to figure out why vidalia isn't running.

#11 Updated by kytv over 4 years ago

  • Related to Bug #9632: restart-vidalia appears to occasionally fail to actually restart Vidalia added

#12 Updated by kytv over 4 years ago

  • % Done changed from 0 to 20

My work in progress:

calling as root: . /usr/local/lib/tails-shell-library/tor.sh; tor_is_working
call returned: [0, "", ""]
spawning as root: restart-vidalia
spawn returned: [0, "", ""]
calling as amnesia: gpg --batch --keyserver-options import-clean --recv-key '10CC5BC7'
call returned: [2, "gpgkeys: HTTP fetch error 35: gnutls_handshake() failed: A TLS fatal alert has been received.\n", "gpg: requesting key 0x10CC5BC7 from hkps server hkps.pool.sks-keyservers.net\ngpg: no valid OpenPGP data found.\ngpg: Total number processed: 0\n"]
Forcing new Tor circuit... (attempt #1)
calling as root: pidof -x -o '%PPID' vidalia
call returned: [1, "", ""]
Vidalia wasn't running. Starting Vidalia...
spawning as root: restart-vidalia
spawn returned: [0, "", ""]
calling as root: pidof -x -o '%PPID' vidalia
call returned: [1, "", ""]
calling as root: pidof -x -o '%PPID' vidalia
call returned: [1, "", ""]
calling as root: pidof -x -o '%PPID' vidalia
call returned: [0, "5875\n", ""]
[log] RIGHT CLICK on (728,11)
[log] CLICK on (782,105)
calling as amnesia: gpg --batch --keyserver-options import-clean --recv-key '10CC5BC7'
call returned: [2, "gpgkeys: HTTP fetch error 35: gnutls_handshake() failed: A TLS fatal alert has been received.\n", "gpg: requesting key 0x10CC5BC7 from hkps server hkps.pool.sks-keyservers.net\ngpg: no valid OpenPGP data found.\ngpg: Total number processed: 0\n"]
Forcing new Tor circuit... (attempt #2)
calling as root: pidof -x -o '%PPID' vidalia
call returned: [0, "5875\n", ""]
[log] RIGHT CLICK on (728,11)
[log] CLICK on (782,105)
calling as amnesia: gpg --batch --keyserver-options import-clean --recv-key '10CC5BC7'
call returned: [2, "gpgkeys: HTTP fetch error 35: gnutls_handshake() failed: A TLS fatal alert has been received.\n", "gpg: requesting key 0x10CC5BC7 from hkps server hkps.pool.sks-keyservers.net\ngpg: no valid OpenPGP data found.\ngpg: Total number processed: 0\n"]
Forcing new Tor circuit... (attempt #3)
calling as root: pidof -x -o '%PPID' vidalia
call returned: [0, "5875\n", ""]
[log] RIGHT CLICK on (728,11)
[log] CLICK on (782,105)
calling as amnesia: gpg --batch --keyserver-options import-clean --recv-key '10CC5BC7'
call returned: [0, "", "gpg: requesting key 0x10CC5BC7 from hkps server hkps.pool.sks-keyservers.net\ngpg: key 0x1D84CCF010CC5BC7: public key \"anonym <anonym@riseup.net>\" imported\ngpg: no ultimately trusted keys found\ngpg: Total number processed: 1\ngpg:               imported: 1  (RSA: 1)\n"]
    Given I fetch the "10CC5BC7" OpenPGP key using the GnuPG CLI without any signatures                                 # features/step_definitions/torified_gnupg.rb:46
    And the GnuPG fetch is successful                                                                                   # features/step_definitions/torified_gnupg.rb:68

#13 Updated by kytv about 4 years ago

  • % Done changed from 20 to 40

At this point I'm satisfied that my test is doing what it should. After a quick self-review of the branch I'll submit for review'n'merge action.

I duplicated the contents of torified_gnupg.feature until there were 66 scenarios and started a test run:

^[[31mFailing Scenarios:^[[0m
^[[31mcucumber features/torified_gnupg.feature:296^[[0m^[[90m # Scenario: Fetching OpenPGP keys using GnuPG should work and be done over Tor.^[[0m

66 scenarios (^[[31m1 failed^[[0m, ^[[32m65 passed^[[0m)
957 steps (^[[32m957 passed^[[0m)
126m44.073s

Investigating the failure, it was unrelated to my work.

Full network capture available at: /home/kytv/git/tails/tmp/torified_gnupg_sniffer.pcap-2015-06-21T03:54:00+00:00
^[[31m      The following IPv4 TCP non-Tor Internet hosts were contacted:^[[0m
^[[31m      93.104.209.61 (RuntimeError)^[[0m
^[[31m      /home/kytv/git/tails/features/support/helpers/firewall_helper.rb:115:in `assert_no_leaks'^[[0m
^[[31m      /home/kytv/git/tails/features/support/hooks.rb:152:in `After'^[[0m
Scenario failed at time 01:25:32
Took screenshot "/home/kytv/git/tails/tmp/torified_gnupg-2015-06-21T03:54:00+00:00.png" 

The tests had to be retried a few times:

# grep 'Forcing\|abort'  gpg.log 
Forcing new Tor circuit... (attempt #1)
Forcing new Tor circuit... (attempt #1)
Forcing new Tor circuit... (attempt #1)
Forcing new Tor circuit... (attempt #2)
Forcing new Tor circuit... (attempt #3)
Forcing new Tor circuit... (attempt #4)
Forcing new Tor circuit... (attempt #1)
Forcing new Tor circuit... (attempt #2)
Forcing new Tor circuit... (attempt #3)
Forcing new Tor circuit... (attempt #1)
Forcing new Tor circuit... (attempt #1)
Forcing new Tor circuit... (attempt #1)

and were successful. :)

#14 Updated by kytv about 4 years ago

  • Assignee changed from kytv to intrigeri
  • QA Check set to Ready for QA
  • Feature Branch set to kytv:test/9518-gpg-retry-with-new-circuit

Note: This branch depends on the one in #9517 (kytv:test/9517-retry-oftc).

#15 Updated by kytv about 4 years ago

  • Blocked by Feature #9517: Retry connecting to OFTC when it fails added

#16 Updated by kytv about 4 years ago

kytv wrote:

Note: This branch depends on the one in #9517 (kytv:test/9517-retry-oftc).

It specifically requires 4f99f6b.

#17 Updated by kytv about 4 years ago

  • Assignee changed from intrigeri to anonym

#18 Updated by intrigeri about 4 years ago

  • Assignee changed from anonym to kytv

kytv, please reassign to me once this branch can actually be merged (that is, once #9517 is completed and merged).

#19 Updated by kytv about 4 years ago

I made (and pushed) changes to this branch due to changes in #9517's. Nothing more should need changing here as a result of further changes in #9517.

#20 Updated by kytv about 4 years ago

  • Target version changed from Tails_1.4.1 to Tails_2.3

#21 Updated by intrigeri about 4 years ago

  • Target version changed from Tails_2.3 to Tails_1.5

#9517 was merged => please merge current stable branch into the topic branch here, test and then reassign to me once you deem it ready.

#22 Updated by kytv about 4 years ago

  • Target version changed from Tails_1.5 to Tails_1.4.1

OK. Merged and doing another test run before pushing/re-assigning.

#23 Updated by kytv about 4 years ago

  • Target version changed from Tails_1.4.1 to Tails_1.5

#24 Updated by kytv about 4 years ago

Target version changed from Tails_1.4.1 to Tails_1.8

Mis-click :S

#25 Updated by kytv about 4 years ago

  • Assignee changed from kytv to intrigeri

stable merged and the branch still does what I expect.

#26 Updated by kytv about 4 years ago

  • Target version changed from Tails_1.5 to Tails_1.4.1

#27 Updated by intrigeri about 4 years ago

  • Target version changed from Tails_1.4.1 to Tails_1.5

kytv wrote:

Investigating the failure, it was unrelated to my work.
[...]

Was a ticket filed about this? If not, please do so.

#28 Updated by kytv about 4 years ago

intrigeri wrote:

kytv wrote:

Investigating the failure, it was unrelated to my work.
[...]

Was a ticket filed about this? If not, please do so.

Just now done over at #9654.

#29 Updated by intrigeri about 4 years ago

  • Assignee changed from intrigeri to kytv
  • QA Check changed from Ready for QA to Info Needed

In 3be9ea5 it's not clear to me if we're re-throwing Test::Unit::AssertionFailedError after failing more than MAX_NEW_TOR_CIRCUIT_RETRIES tries. Are we missing a final assert like you added in a7f6a9b, perhaps?

Also, if we're starting catching exceptions thrown by vm.execute_successfully, perhaps it should throw its own exception class, instead of the generic Test::Unit::AssertionFailedError? Not sure if it's a blocker.

Otherwise, looks good! Untested for now.

#30 Updated by kytv about 4 years ago

  • QA Check changed from Info Needed to Dev Needed

intrigeri wrote:

In 3be9ea5 it's not clear to me if we're re-throwing Test::Unit::AssertionFailedError after failing more than MAX_NEW_TOR_CIRCUIT_RETRIES tries. Are we missing a final assert like you added in a7f6a9b, perhaps?

An earlier revision had an assertion here but I removed it because it felt redundant due to the the GnuPG fetch is successful step. Looking at it again I think it should indeed have the assertion.

Also, if we're starting catching exceptions thrown by vm.execute_successfully, perhaps it should throw its own exception class, instead of the generic Test::Unit::AssertionFailedError? Not sure if it's a blocker.

This would probably be better.

#31 Updated by kytv about 4 years ago

  • Assignee changed from kytv to intrigeri
  • QA Check changed from Dev Needed to Ready for QA

Both issues have been addressed so maybe it's acceptable now.

#32 Updated by kytv about 4 years ago

  • Blocks Feature #9668: Retry whois request on "WHOIS LIMIT EXCEEDED" added

#33 Updated by intrigeri about 4 years ago

  • Assignee changed from intrigeri to kytv
  • % Done changed from 40 to 50
  • QA Check changed from Ready for QA to Dev Needed

kytv wrote:

Both issues have been addressed so maybe it's acceptable now.

Well, thanks for introducing a new exception class, but then it needs to be used consistently: at first glance, there are other places that seem to catch errors from execute_successfully by rescue'ing Test::Unit::AssertionFailedError. Do you want to go back to the drawing board for this, or should I instead merge everything except 2f25a195?

#34 Updated by kytv about 4 years ago

  • Assignee changed from kytv to intrigeri
  • QA Check changed from Dev Needed to Info Needed

intrigeri wrote:

kytv wrote:

Both issues have been addressed so maybe it's acceptable now.

Well, thanks for introducing a new exception class, but then it needs to be used consistently: at first glance, there are other places that seem to catch errors from execute_successfully by rescue'ing Test::Unit::AssertionFailedError.

Interesting. I looked for other instances and didn't see any. Looking again I still don't see any.

Running git grep -B7 'rescue.*Test::Unit::AssertionFailedError' -- features in this branch (after temporarily merging in stable) I see

features/step_definitions/checks.rb-
features/step_definitions/checks.rb-Then /^the OpenPGP keys shipped with Tails will be valid for the next (\d+) months$/ do |months|
features/step_definitions/checks.rb-  next if @skip_steps_while_restoring_background
features/step_definitions/checks.rb-  invalid = Array.new
features/step_definitions/checks.rb-  shipped_openpgp_keys.each do |key|
features/step_definitions/checks.rb-    begin
features/step_definitions/checks.rb-      step "the shipped OpenPGP key #{key} will be valid for the next #{months} months" 
features/step_definitions/checks.rb:    rescue Test::Unit::AssertionFailedError
--
features/step_definitions/common_steps.rb-end
features/step_definitions/common_steps.rb-
features/step_definitions/common_steps.rb-Then /^I force Tor to use a new circuit( in Vidalia)?$/ do |with_vidalia|
features/step_definitions/common_steps.rb-  if with_vidalia
features/step_definitions/common_steps.rb-    assert_equal('gnome', @theme, "Vidalia is not available in the #{@theme} theme.")
features/step_definitions/common_steps.rb-    begin
features/step_definitions/common_steps.rb-      step 'process "vidalia" is running'
features/step_definitions/common_steps.rb:    rescue Test::Unit::AssertionFailedError
--
features/step_definitions/pidgin.rb-    account = $config["Pidgin"]["Accounts"]["XMPP"][account_name]
features/step_definitions/pidgin.rb-    check_keys = ["username", "domain", "password"] + required_options
features/step_definitions/pidgin.rb-    for key in check_keys do
features/step_definitions/pidgin.rb-      assert(account.has_key?(key))
features/step_definitions/pidgin.rb-      assert_not_nil(account[key])
features/step_definitions/pidgin.rb-      assert(!account[key].empty?)
features/step_definitions/pidgin.rb-    end
features/step_definitions/pidgin.rb:  rescue NoMethodError, Test::Unit::AssertionFailedError
--
features/step_definitions/pidgin.rb-  next if @skip_steps_while_restoring_background
features/step_definitions/pidgin.rb-  expected_channel_entry = chan_image(account, default_chan(account), 'roster')
features/step_definitions/pidgin.rb-  tries = 0
features/step_definitions/pidgin.rb-  until tries == $config["MAX_NEW_TOR_CIRCUIT_RETRIES"] do
features/step_definitions/pidgin.rb-    # Sometimes the OFTC welcome notice window pops up over the buddy list one...
features/step_definitions/pidgin.rb-    begin
features/step_definitions/pidgin.rb-      @vm.focus_window('Buddy List')
features/step_definitions/pidgin.rb:    rescue Test::Unit::AssertionFailedError
--
features/step_definitions/tor.rb-  next if @skip_steps_while_restoring_background
features/step_definitions/tor.rb-  bridge_type.downcase!
features/step_definitions/tor.rb-  bridge_type.capitalize!
features/step_definitions/tor.rb-  begin
features/step_definitions/tor.rb-    @bridges = $config["Tor"]["Transports"][bridge_type]
features/step_definitions/tor.rb-    assert_not_nil(@bridges)
features/step_definitions/tor.rb-    assert(!@bridges.empty?)
features/step_definitions/tor.rb:  rescue NoMethodError, Test::Unit::AssertionFailedError
--
features/support/helpers/vm_helper.rb-    return VMCommand.new(self, cmd, { :user => user, :spawn => false })
features/support/helpers/vm_helper.rb-  end
features/support/helpers/vm_helper.rb-
features/support/helpers/vm_helper.rb-  def execute_successfully(cmd, user = "root")
features/support/helpers/vm_helper.rb-    p = execute(cmd, user)
features/support/helpers/vm_helper.rb-    begin
features/support/helpers/vm_helper.rb-      assert_vmcommand_success(p)
features/support/helpers/vm_helper.rb:    rescue Test::Unit::AssertionFailedError => e

The last line there is from my change in 2f25a195.

Do you want to go back to the drawing board for this, or should I instead merge everything except 2f25a195?

If you can direct me to what I'm missing I'll be happy to do it correctly across the board but I'm just not seeing it ATM. :(

#35 Updated by kytv about 4 years ago

Maybe

features/step_definitions/pidgin.rb-      @vm.focus_window('Buddy List')
features/step_definitions/pidgin.rb:    rescue Test::Unit::AssertionFailedError

#36 Updated by kytv about 4 years ago

  • QA Check changed from Info Needed to Ready for QA

Yes, I think that pidgin step was missed earlier.

Branch updated again.

#37 Updated by intrigeri about 4 years ago

  • % Done changed from 50 to 60

Code now looks good! Will test and hopefully merge :)

I'm just wondering if ExecutionFailed should perhaps be called ExecutionFailedInVM, or something that indicates more clearly that it's about a failed execution in the system under test. What do you think?

#38 Updated by kytv about 4 years ago

intrigeri wrote:

Code now looks good! Will test and hopefully merge :)

Woohoo

I'm just wondering if ExecutionFailed should perhaps be called ExecutionFailedInVM, or something that indicates more clearly that it's about a failed execution in the system under test. What do you think?

Completely reasonable suggestion. Changed & pushed.

#39 Updated by kytv about 4 years ago

#40 Updated by kytv about 4 years ago

#41 Updated by intrigeri about 4 years ago

  • Status changed from In Progress to Fix committed
  • % Done changed from 60 to 100

#42 Updated by intrigeri about 4 years ago

  • Assignee deleted (intrigeri)
  • QA Check changed from Ready for QA to Pass

Merged!

#43 Updated by intrigeri about 4 years ago

Hint: in the future, please retitle the ticket when the chosen implementation differs significantly, which is the case here. This would help writing changelog, reports, etc. Thanks in advance!

#44 Updated by BitingBird about 4 years ago

  • Status changed from Fix committed to Resolved

Also available in: Atom PDF