Project

General

Profile

Bug #10537

Fix restart-tor and tor-has-bootstrapped semantics vs. test suite on Jessie

Added by intrigeri over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
Start date:
11/12/2015
Due date:
% Done:

100%

QA Check:
Feature Branch:
feature/jessie
Type of work:
Code
Blueprint:
Starter:
Affected tool:

Description

On feature/jessie, the "Tor has bootstrapped" code has been revamped and ported to systemd; some optimization that has gone into it apparently breaks assumptions that the automated test suite was relying on already, and/or that code added for #9516 started relying on. We should identify what exactly the test suite needs, and then check whether we'd better change the semantics of restart-tor/tor-has-bootstrapped to satisfy the test suite's needs, or instead have the test suite take into account the changed semantics somehow.


Related issues

Related to Tails - Feature #9516: Restart Tor if bootstrapping stalls for too long Resolved 06/02/2015
Blocks Tails - Feature #7563: Update the automated test suite for Jessie ISO images Resolved 11/26/2014 01/15/2016

History

#1 Updated by intrigeri over 3 years ago

  • Description updated (diff)

#2 Updated by intrigeri over 3 years ago

  • Assignee set to anonym
  • QA Check set to Info Needed

anonym/kytv, can you please sum up to me what semantics the test suite expects out of restart-tor / tor-has-bootstrapped / whatever is broken from the test suite's PoV currently? Ideally, please also point me to the exact place/context in the test suite's code that relies on this.

And then I'll check if such needs+context are test suite -specific (and thus I can add whatever code is needed in the test suite itself, e.g. when post-snapshot-restore or something), or if instead we have a regression for supported use cases (in which case I'll fix the restart-tor/tor-has-bootstrapped semantics).

#3 Updated by intrigeri over 3 years ago

  • Blocks Feature #7563: Update the automated test suite for Jessie ISO images added

#4 Updated by intrigeri over 3 years ago

  • Priority changed from Normal to High
  • Target version changed from Tails_2.0 to Tails_1.8

(Blocks #7563 if I got it right.)

#5 Updated by intrigeri over 3 years ago

  • Feature Branch set to feature/jessie

#7 Updated by kytv over 3 years ago

intrigeri wrote:

anonym/kytv, can you please sum up to me what semantics the test suite expects out of restart-tor / tor-has-bootstrapped / whatever is broken from the test suite's PoV currently? Ideally, please also point me to the exact place/context in the test suite's code that relies on this.

And then I'll check if such needs+context are test suite -specific (and thus I can add whatever code is needed in the test suite itself, e.g. when post-snapshot-restore or something), or if instead we have a regression for supported use cases (in which case I'll fix the restart-tor/tor-has-bootstrapped semantics).

I might be stating the obvious. If so, apologies for my useless text below.

In features/step_definitions/common.rb (lines 108-114) we have:

  if $vm.has_network?
    if $vm.execute("systemctl --quiet is-active tor@default.service").success?
      $vm.execute("systemctl stop tor@default.service")
      $vm.execute("rm -f /var/log/tor/log")
      $vm.host_to_guest_time_sync
      $vm.spawn("restart-tor")
      wait_until_tor_is_working

wait_until_tor_is_working lives in features/support/helpers/misc_helpers.rb at line 139.

def wait_until_tor_is_working
  try_for(270) { $vm.execute('/usr/local/sbin/tor-has-bootstrapped').success? }
rescue Timeout::Error => e
  c = $vm.execute("grep restart-tor /var/log/syslog")
  if c.success?
    debug_log("From syslog:\n" + c.stdout.sub(/^/, "  "))
  else
    debug_log("Nothing was syslog:ed about 'restart-tor'")
  end 
  raise e
end

As can be seen at line 142, this part of the test suite needs to be migrated to jessie since we'll want to use the journal instead.

WRT to the test suite, the problem I experienced was that after restoring a network-enabled snapshot we restart tor using restart-tor. We check that tor is ready with /usr/local/sbin/tor-has-bootstrapped which returns 0, e.g., Tor is ready, the pre-snapshot status of Tor.

I don't know how much more than this I can provide since I've done almost no investigating of the problem, having only "worked around" it by raising the number of Tor retries while working on #9791, but I'm happy to try.

#8 Updated by intrigeri over 3 years ago

  • Assignee changed from anonym to intrigeri
  • QA Check deleted (Info Needed)

After chatting with kytv I managed to read between the lines enough to conclude that the semantics you seem to want is "tor-has-bootstrapped gives info about the current status of Tor, not about whether Tor has ever successfully bootstrapped since the system was started". Thanks. So this is back on my plate to check where we should update things.

#9 Updated by intrigeri over 3 years ago

IOW the test suite needs this behaviour:

Given a Tails system
And Tor has bootstrapped
When I stop and restart the Tor daemon
Then tor-has-bootstrapped returns false until the currently running Tor daemon has itself bootstrapped

(or it needs to do something around stop/restart so that this becomes true).

#10 Updated by kytv over 3 years ago

  • Assignee changed from intrigeri to anonym
  • QA Check set to Info Needed

The problem that I mentioned in the XMPP chat a few days ago has been resolved in recent commits. :)

calling as root: systemctl stop tor@default.service
call returned: [0, "", ""]
calling as root: rm -f /var/log/tor/log
call returned: [0, "", ""]
calling as root: date -s '@1447349674'
call returned: [0, "Thu Nov 12 17:34:34 UTC 2015\n", ""]
spawning as root: restart-tor
calling as root: /usr/local/sbin/tor-has-bootstrapped
call returned: [1, "", ""]
calling as root: /usr/local/sbin/tor-has-bootstrapped
call returned: [1, "", ""]
calling as root: /usr/local/sbin/tor-has-bootstrapped
call returned: [1, "", ""]
calling as root: /usr/local/sbin/tor-has-bootstrapped
call returned: [1, "", ""]
calling as root: /usr/local/sbin/tor-has-bootstrapped
call returned: [1, "", ""]
calling as root: /usr/local/sbin/tor-has-bootstrapped
call returned: [1, "", ""]
calling as root: /usr/local/sbin/tor-has-bootstrapped
call returned: [1, "", ""]
calling as root: /usr/local/sbin/tor-has-bootstrapped
call returned: [1, "", ""]
calling as root: /usr/local/sbin/tor-has-bootstrapped
call returned: [1, "", ""]
calling as root: /usr/local/sbin/tor-has-bootstrapped
call returned: [1, "", ""]
calling as root: /usr/local/sbin/tor-has-bootstrapped
call returned: [0, "", ""]
calling as root: cat /proc/cmdline
call returned: [0, "BOOT_IMAGE=/live/vmlinuz2 initrd=/live/initrd2.img boot=live config live-m
calling as amnesia: gpg --batch --list-keys '10CC5BC7'
call returned: [2, "", "gpg: error reading key: public key not found\n"]
calling as amnesia: gpg --batch  --recv-key '10CC5BC7'
call returned: [0, "", "gpg: requesting key 0x10CC5BC7 from hkps server hkps.pool.sks-keyserve
    When I fetch the "10CC5BC7" OpenPGP key using the GnuPG CLI                           # fe
    Then GnuPG uses the configured keyserver                                              # fe
    And the GnuPG fetch is successful                                                     # fe
calling as amnesia: gpg --batch --list-keys '10CC5BC7'
call returned: [0, "pub   4096R/0x1D84CCF010CC5BC7 2012-08-21 [expires: 2016-08-24]\n      Key
    And the "10CC5BC7" key is in the live user's public keyring after at most 120 seconds # fe
calling as root: awk "/^r/ { print \$6 }" /var/lib/tor/cached-microdesc-consensus

Previously tor-has-bootstrapped reported success immediately after a snapshot was restored. Now I'm "happy" with respect to the problem I mentioned in the XMPP channel. Yay ♥

What anonym mentioned in the description may still be a problem, e.g.

Given the network is plugged
And Tor has started
But Tor's bootstrapping process is stuck for more than $however_long_we_decide_is_reasonable
Then Tor should be restarted this time and every time the bootstrapping doesn't progress
     during $however_long_we_decide_is_reasonable
But if Tor hasn't successfully bootstrapped within 270 seconds (or a reasonable timeframe
     that we decide on)
Then we stop trying to restart Tor
And we notify the user that there's a problem

Like #9516, this isn't a problem with just the test suite but one that users also experience.

#11 Updated by intrigeri over 3 years ago

  • Assignee changed from anonym to intrigeri
  • QA Check deleted (Info Needed)

kytv wrote:

The problem that I mentioned in the XMPP chat a few days ago has been resolved in recent commits. :)

Well, I believe that you've been lucky:

Previously tor-has-bootstrapped reported success immediately after a snapshot was restored.

... and now it reports success immediately after the Tor service has started, as long as it has bootstrapped in the past (e.g. when creating the snapshot), even though it hasn't bootstrapped again. In the case you're reporting about, Tor (re-)bootstrap time won the race and thus it worked for you. But last time I checked the code, nothing ensured it'll always win the race.

So my understanding is that I still have to implement what was described in #10537#note-9.

What anonym mentioned in the description

We had no input from anonym on this ticket so far.

may still be a problem, e.g.

This is likely. Next step on that front is to express the semantics needed by the code added for #9516. I think it'll cost me (and probably us) less energy if I do it myself instead of trying to communicate about clear interfaces requirements. And then I'll port it to feature/jessie as needed, I suppose.

Adding back to my plate.

#12 Updated by intrigeri over 3 years ago

  • Related to Feature #9516: Restart Tor if bootstrapping stalls for too long added

#13 Updated by intrigeri over 3 years ago

intrigeri wrote:

So my understanding is that I still have to implement what was described in #10537#note-9.

There was a misunderstanding when we discussed this with kytv, apparently: on Tails 1.7, as long as Tor has ever bootstrapped, tor-has-bootstrapped always returns true, even if I stop Tor or restart it. So we hadn't these semantics on 1.7, so we can't be lacking them on feature/jessie I guess.

Back to #9516 code then.

#14 Updated by intrigeri over 3 years ago

Note (mostly to myself): 27edb1a might help debugging this.

#15 Updated by intrigeri over 3 years ago

  • Status changed from Confirmed to Resolved
  • Assignee deleted (intrigeri)
  • % Done changed from 0 to 100

intrigeri wrote:

Back to #9516 code then.

I've looked at restart-tor and my conclusion is that I don't see what has changed in feature/jessie that can break this script, so it looks like kytv was totally right that I've fixed the problem without knowing in 1390e8b, before I had heard about it or understood what it was about.

Closing, please open tickets about specific problems if there are any left :)

Also available in: Atom PDF