Project

General

Profile

Feature #9517

Feature #9515: Improve test suite robustness vs transient network errors

Retry connecting to OFTC when it fails

Added by anonym about 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Test suite
Target version:
Start date:
06/02/2015
Due date:
% Done:

100%

Feature Branch:
kytv:test/9517-retry-oftc
Type of work:
Code
Blueprint:
Starter:
Affected tool:

Description

That is when we get the SSL handshake error, and when the server actually inform us that Tor is block at the moment.

It may be enough to simply force new Tor circuits, e.g. by using Vidalia's New Identity feature and then click the "Reconnect" button in Pidgin, but I'm not sure. We may have to retry multiple times in a row.


Related issues

Related to Tails - Bug #9632: restart-vidalia appears to occasionally fail to actually restart Vidalia Rejected 06/20/2015
Blocks Tails - Bug #9072: Pidgin IRC tests often fail due to OFTC Tor blocking Resolved 03/18/2015
Blocks Tails - Feature #9518: Retry with new OpenPGP key server pool member when they misbehave Resolved 06/02/2015

Associated revisions

Revision fc9b3cdb (diff)
Added by kytv about 4 years ago

Retry connecting to OFTC on failure

If we fail to connect to OFTC we'll use Vidalia to force a new Tor
circuit then attempt to reconnect.

Will-fix: #9517

Revision 5011ea2a (diff)
Added by kytv about 4 years ago

Close message tabs until the channel tab can be found

If several connection attempts are required Sikuli may not be able to
find the #tails channel because several server notice tabs are to the
left of it. To work around this we'll focus the conversation window and
close tabs until the channel tab appears.

Will-fix: #9517

Revision 4098c671 (diff)
Added by kytv about 4 years ago

Don't use Vidalia to force a new Tor circuit

This should speed up the process by not having to interact with the GUI

Will-fix: #9517

Revision d9800fc7
Added by intrigeri about 4 years ago

Merge remote-tracking branch 'kytv/test/9517-retry-oftc' into stable

Fix-committed: #9517

History

#1 Updated by anonym about 4 years ago

  • Blocks Bug #9072: Pidgin IRC tests often fail due to OFTC Tor blocking added

#3 Updated by anonym about 4 years ago

  • Related to Bug #9478: How to deal with transient network errors in the test suite? added

#5 Updated by intrigeri about 4 years ago

  • Related to deleted (Bug #9478: How to deal with transient network errors in the test suite?)

#6 Updated by intrigeri about 4 years ago

  • Blocks deleted (Bug #9072: Pidgin IRC tests often fail due to OFTC Tor blocking)

#7 Updated by intrigeri about 4 years ago

  • Blocks Bug #9072: Pidgin IRC tests often fail due to OFTC Tor blocking added

#9 Updated by anonym about 4 years ago

anonym wrote:

That is when we get the SSL handshake error, and when the server actually inform us that Tor is block at the moment.

It may be enough to simply force new Tor circuits, e.g. by using Vidalia's New Identity feature and then click the "Reconnect" button in Pidgin, but I'm not sure. We may have to retry multiple times in a row.

Now I've verified that new circuit => a new pool member is picked on Pidgin reconnect. However, today (2015-06-17) the blocking is particularly severe. I haven't been able to connect to irc.oftc.net via Tor for three hours while actively issuing a NEWNYM every minute (and manually clicking reconnect every 1-5 minutes, so I've done 100+ connections), getting both "Banned" and "No more connections permitted from your host" as the explanation.

So the retry strategy will not solve #9072 but I guess it will slightly improve it when only some pool members refuse connections via Tor. YMMV. It may still be worth implementing,

#10 Updated by anonym about 4 years ago

And now I've had 24 hours of continuous blocking...

#11 Updated by intrigeri about 4 years ago

So the retry strategy will not solve #9072 but I guess it will slightly improve it when only some pool members refuse connections via Tor. YMMV. It may still be worth implementing,

Yep, I think it'll fix the most common cases.

#12 Updated by kytv about 4 years ago

  • Related to Bug #9632: restart-vidalia appears to occasionally fail to actually restart Vidalia added

#13 Updated by kytv about 4 years ago

  • Status changed from Confirmed to In Progress
  • % Done changed from 0 to 30

intrigeri wrote:

So the retry strategy will not solve #9072 but I guess it will slightly improve it when only some pool members refuse connections via Tor. YMMV. It may still be worth implementing,

Yep, I think it'll fix the most common cases.

Agreed on both accounts.

I've got this implemented and am testing it now. It seems to be doing "the right thing" now that the blocking is quite severe. Once OFTC relents I'll find out if I inadvertently broke anything.

(Note: I have MAX_RETRIES normally set to 5, but I want to take advantage of OFTC's Tor blocking to test code robustness so I bumped the max retries to 10000)

calling as amnesia: xdotool search --name 'Buddy List' windowactivate --sync
call returned: [0, "", ""]
Forcing new Tor circuit... (attempt #157)
calling as root: pidof -x -o '%PPID' vidalia
...........

#14 Updated by kytv about 4 years ago

  • % Done changed from 30 to 40

I'm quite pleased with how this is going:

calling as amnesia: xdotool search --name 'Buddy List' windowactivate --sync
call returned: [0, "", ""]
Forcing new Tor circuit... (attempt #14)
calling as root: pidof -x -o '%PPID' vidalia
call returned: [0, "5777\n", ""]
calling as root: pidof -x -o '%PPID' vidalia
call returned: [0, "5777\n", ""]
[log] TYPE " 
[log] RIGHT CLICK on (728,11)
[log] CLICK on (782,105)
[log] CLICK on (364,509)
calling as amnesia: xdotool search --name 'Buddy List' windowactivate --sync
call returned: [0, "", ""]
    Then Pidgin successfully connects to the "irc.oftc.net" account              # features/step_definitions/pidgin.rb:298
[log] DOUBLE CLICK on (195,236)
calling as amnesia: xdotool search --name '.*.oftc.net$' windowactivate --sync
call returned: [0, "", ""]
[log] Ctrl+TYPE "w" 
[log] CLICK on (806,215)
    And I can join the "#tails" channel on "irc.oftc.net"                        # features/step_definitions/pidgin.rb:342
[log] TYPE "/topic" 
    When I type "/topic"                                                         # features/step_definitions/common_steps.rb:876
[log] TYPE " 
" 
    And I press the "ENTER" key                                                  # features/step_definitions/common_steps.rb:881
    Then I see the Tails roadmap URL                                             # features/step_definitions/pidgin.rb:435
[log] CLICK on (451,375)
    When I click on the Tails roadmap URL                                        # features/step_definitions/pidgin.rb:446
    Then the Tor Browser has started and loaded the Tails roadmap                # features/step_definitions/common_steps.rb:372

1 scenario (1 passed)
15 steps (15 passed)
18m25.940s

#15 Updated by kytv about 4 years ago

  • Assignee changed from kytv to intrigeri
  • QA Check set to Ready for QA
  • Feature Branch set to kytv:test/9517-retry-oftc

Note: I wanted to use waitAny in fc9b3cdbac6d but I ran into the problem in #9633. Perhaps the extra wait won't so bad in this particular case, especially at those times (like now) when OFTC is relentlessly blocking Tor.

Intrigeri, apologies in advance if I'm wrong to assign this to you. (Hopefully you're as pleased with this branch as I am.)

#16 Updated by kytv about 4 years ago

Another note: By default this branch will retry with a new Tor circuit 5 times, but this can be overridden. While testing I had it set to a ridiculously high number. After 51 retries it was able to connect.

calling as amnesia: xdotool search --name 'Buddy List' windowactivate --sync
call returned: [0, "", ""]
Forcing new Tor circuit... (attempt #51)
calling as root: pidof -x -o '%PPID' vidalia
call returned: [0, "6989\n", ""]
calling as root: pidof -x -o '%PPID' vidalia
call returned: [0, "6989\n", ""]
[log] TYPE " 
[log] RIGHT CLICK on (728,11)
[log] CLICK on (782,105)
[log] CLICK on (364,509)
calling as amnesia: xdotool search --name 'Buddy List' windowactivate --sync
call returned: [0, "", ""]
    Then Pidgin successfully connects to the "irc.oftc.net" account
[log] DOUBLE CLICK on (195,236)
calling as amnesia: xdotool search --name '.*.oftc.net$' windowactivate --sync
call returned: [0, "", ""]
[log] Ctrl+TYPE "w" 
[log] Ctrl+TYPE "w" 
[log] Ctrl+TYPE "w" 
[log] CLICK on (806,215)
    And I can join the "#tails" channel on "irc.oftc.net" 

#17 Updated by kytv about 4 years ago

  • Blocks Feature #9518: Retry with new OpenPGP key server pool member when they misbehave added

#18 Updated by kytv about 4 years ago

  • Assignee changed from intrigeri to anonym

#19 Updated by anonym about 4 years ago

  • Assignee changed from anonym to kytv
  • QA Check changed from Ready for QA to Dev Needed

Nice! I've run it a few times now when OFTC is blocking Tor, and it works as intended.

Review:

+Then /^(in Vidalia )?I force Tor to use a new circuit$/ do |with_vidalia|

Although grammatically correct, I think it would be a better flow if the (in Vidalia )? part was moved to the end.

+ # Sometimes Sikuli gets confused and recognizes the yellow-colored vidalia systray
+ # icon as the green one. This has been seen when Vidalia needed to be
+ # restarted in the above 'begin' block.

Wow, this I find highly surprising and a bit scary.

Just to be clear, is that why we need the try_for in the first place? If so, perhaps the comment can be clarified a bit?

+ try_for(3 * 60) do
+ # Let's be sure that vidalia is still running. I'd hate to spend up to
+ # three minutes waiting for an icon that isn't there because Vidalia, for
+ # whatever reason, is no longer running...
+ step 'process "vidalia" is running'
+ @screen.type(Sikuli::Key.ESC)
+ @screen.wait_and_right_click('VidaliaSystrayReady.png', 10)
+ @screen.wait_and_click('VidaliaMenuNewIdentity.png', 10)
+ @screen.wait('VidaliaNewIdentityNotification.png', 20)
+ end
+ @screen.wait('VidaliaNewIdentityNotification.png', 20)
+ @screen.waitVanish('VidaliaNewIdentityNotification.png', 60)

1. Why do we press escape? A comment explaining why is definitely needed in a case like this. However, I do not believe it's safe for us to do that at all: upon entry to this step we do nothing to change the window focus so (at least in the first iteration of the try_for) that escape will be sent to whatever application has the focus. If that was a Pidgin conversation window the current tab would be closed, for instance.

2. Why do we wait twice for the notification?

3. Can't we close the notification instead of waitVanish? That would save a couple of seconds. No big deal, though. (You can definitely skip this if you agree that we only should use Vidalia to force new circuits in a dedicated scenario, see below.)

+ else
+ @vm.execute_successfully('restart-tor')
+ end

Restarting Tor to force new circuits feels a bit drastic. Let's send a NEWNYM signal instead. AFAICT it'd be easy:

@vm.execute_successfully('. /usr/local/lib/tails-shell-library/tor.sh; tor_control_send "signal NEWNYM"')

Then /^Pidgin successfully connects to the "([^"]+)" account$/ do |account|

[...]

+ begin
+ @screen.wait(expected_channel_entry, 60)
+ break
+ rescue FindFailed
+ tries = 1
STDERR.puts "Forcing new Tor circuit... (attempt ##{tries})" if $config["DEBUG"]
+ step "in Vidalia I force Tor to use a new circuit"
[...]

On second thought, I think we actually should force new circuits without Vidalia to save some time (due to no GUI interaction). However, we should add a dedicated scenario (in some other feature) for Vidalia's "New identity" feature so your code is still needed! If you agree, let's open a ticket for that (no need to do it now IMHO).

def focus_pidgin_conversation_window(account)
case account
+ when /.*\.oftc\.net$/
+ @vm.focus_window(".*\.oftc\.net$")
+ end
+end

It's a bit confusing that we in other parts are using @vm.focus_window directly to focus Pidgin's conversation window. Instead of doing this on a case-by-case basis I think we could:

def focus_pidgin_conversation_window(account)
  @vm.focus_window(".*#{Regexp.escape(account)}.*")
end

and then use it in the other places as well. We could also add a function focus_pidgin_buddy_list.

def close_pidgin_conversation_window(account)
focus_pidgin_conversation_window(account)
+ @screen.type(Sikuli::Key.F4, Sikuli::KeyModifier.ALT)

I'm afraid this is racy but let's see how it works out. It looks good so far at least.

I'm gonna look into #9633.

#20 Updated by kytv about 4 years ago

anonym wrote:

Nice! I've run it a few times now when OFTC is blocking Tor, and it works as intended.

Review:

+Then /^(in Vidalia )?I force Tor to use a new circuit$/ do |with_vidalia|

Although grammatically correct, I think it would be a better flow if the (in Vidalia )? part was moved to the end.

OK.

+ # Sometimes Sikuli gets confused and recognizes the yellow-colored vidalia systray
+ # icon as the green one. This has been seen when Vidalia needed to be
+ # restarted in the above 'begin' block.

Wow, this I find highly surprising and a bit scary.

Agreed.

Just to be clear, is that why we need the try_for in the first place? If so, perhaps the comment can be clarified a bit?

Yes, this is exactly why I used try_for there.

+ try_for(3 * 60) do
+ # Let's be sure that vidalia is still running. I'd hate to spend up to
+ # three minutes waiting for an icon that isn't there because Vidalia, for
+ # whatever reason, is no longer running...
+ step 'process "vidalia" is running'
+ @screen.type(Sikuli::Key.ESC)
+ @screen.wait_and_right_click('VidaliaSystrayReady.png', 10)
+ @screen.wait_and_click('VidaliaMenuNewIdentity.png', 10)
+ @screen.wait('VidaliaNewIdentityNotification.png', 20)
+ end
+ @screen.wait('VidaliaNewIdentityNotification.png', 20)
+ @screen.waitVanish('VidaliaNewIdentityNotification.png', 60)

1. Why do we press escape? A comment explaining why is definitely needed in a case like this. However, I do not believe it's safe for us to do that at all: upon entry to this step we do nothing to change the window focus so (at least in the first iteration of the try_for) that escape will be sent to whatever application has the focus. If that was a Pidgin conversation window the current tab would be closed, for instance.

It was added for those cases in which Sikuli right-clicked the Vidalia icon and the menu was still displayed from the "misclick". Without the ESC it would resolve itself when the try_for loop ran again.

2. Why do we wait twice for the notification?

Good question. To be investigated and maybe removed.

3. Can't we close the notification instead of waitVanish? That would save a couple of seconds. No big deal, though. (You can definitely skip this if you agree that we only should use Vidalia to force new circuits in a dedicated scenario, see below.)

Could close it and/or make it a separate Vidalia step.

+ else
+ @vm.execute_successfully('restart-tor')
+ end

Restarting Tor to force new circuits feels a bit drastic. Let's send a NEWNYM signal instead. AFAICT it'd be easy:
[...]

Sounds good. :)

Then /^Pidgin successfully connects to the "([^"]+)" account$/ do |account|

[...]

+ begin
+ @screen.wait(expected_channel_entry, 60)
+ break
+ rescue FindFailed
+ tries = 1
STDERR.puts "Forcing new Tor circuit... (attempt ##{tries})" if $config["DEBUG"]
+ step "in Vidalia I force Tor to use a new circuit"
[...]

On second thought, I think we actually should force new circuits without Vidalia to save some time (due to no GUI interaction). However, we should add a dedicated scenario (in some other feature) for Vidalia's "New identity" feature so your code is still needed! If you agree, let's open a ticket for that (no need to do it now IMHO).

This is fine with me.

def focus_pidgin_conversation_window(account)
case account
+ when /.*\.oftc\.net$/
+ @vm.focus_window(".*\.oftc\.net$")
+ end
+end

It's a bit confusing that we in other parts are using @vm.focus_window directly to focus Pidgin's conversation window. Instead of doing this on a case-by-case basis I think we could:
[...]
and then use it in the other places as well. We could also add a function focus_pidgin_buddy_list.

def close_pidgin_conversation_window(account)
focus_pidgin_conversation_window(account)
+ @screen.type(Sikuli::Key.F4, Sikuli::KeyModifier.ALT)

I'm afraid this is racy but let's see how it works out. It looks good so far at least.

I think it may need some tweaking but it'll be easier to test when the nearly relentless OFTC blocking of Tor relents.

I'm gonna look into #9633.

#21 Updated by kytv about 4 years ago

anonym wrote:

Restarting Tor to force new circuits feels a bit drastic. Let's send a NEWNYM signal instead. AFAICT it'd be easy:
[...]

Excellent suggestion.

Forcing new Tor circuit... (attempt #21)
calling as root: . /usr/local/lib/tails-shell-library/tor.sh; tor_control_send "signal NEWNYM" 
call returned: [0, "250 OK\n250 OK\n250 closing connection\n", ""]
[log] CLICK on (364,509)
calling as amnesia: xdotool search --name 'Buddy List' windowactivate --sync
call returned: [0, "", ""]

I've already run it a few times with success. Now to try applying/testing the proposed fix from #9633...

#22 Updated by kytv about 4 years ago

  • Assignee changed from kytv to anonym
  • QA Check changed from Dev Needed to Ready for QA

Changes pushed.

I switched to forcing a new circuit without vidalia which works very well. :)

Once 9633 is fixed in the tree I can optimize the tests with waitAny. (I already have that branch ready to go)

#23 Updated by kytv about 4 years ago

  • Assignee changed from anonym to intrigeri

#24 Updated by intrigeri about 4 years ago

  • Target version changed from Tails_1.4.1 to Tails_1.5

#25 Updated by kytv about 4 years ago

#26 Updated by kytv about 4 years ago

#27 Updated by kytv about 4 years ago

#28 Updated by intrigeri about 4 years ago

  • Status changed from In Progress to Fix committed
  • % Done changed from 40 to 100

#29 Updated by intrigeri about 4 years ago

  • Assignee deleted (intrigeri)
  • Target version changed from Tails_1.5 to Tails_1.4.1
  • QA Check changed from Ready for QA to Pass

#30 Updated by intrigeri about 4 years ago

  • Status changed from Fix committed to Resolved

Also available in: Atom PDF