Project

General

Profile

Bug #5394

Time syncing can break Tor bootstrap when using several bridges with different certificate lifespans

Added by Tails over 6 years ago. Updated about 1 month ago.

Status:
In Progress
Priority:
Normal
Assignee:
-
Category:
Tor configuration
Target version:
-
Start date:
Due date:
% Done:

0%

Feature Branch:
bugfix/5394-time_syncing_in_bridge_mode+force-all-tests
Type of work:
Code
Blueprint:
Starter:
No
Affected tool:

Description

See #5394#note-16 for the problem this ticket currently tracks: the initial analysis was incorrect.


Related issues

Related to Tails - Bug #11589: Time syncing over bridge is fragile Confirmed 07/22/2016

Associated revisions

Revision 4e47f200 (diff)
Added by Tails developers almost 5 years ago

Make sure we use the highest cert vadlid-after when setting the date.

Otherwise we may end up picking the lower, and sometimes Tor seems to
stop retrying some bridges, and if it's the one we set the date to, we
get stuck.

Will-fix: #5394

History

#1 Updated by intrigeri over 6 years ago

  • Parent task set to #5479
  • Starter set to No

#2 Updated by intrigeri over 6 years ago

  • Parent task changed from #5479 to #5920

#3 Updated by intrigeri about 6 years ago

  • Assignee set to anonym

#4 Updated by intrigeri almost 6 years ago

  • Assignee changed from anonym to sajolida
  • % Done changed from 0 to 50
  • QA Check set to Ready for QA

anonym says: "shouldn't be relevant since Tor Launcher (started by the NM hook before the one starting the time syncing) blocks until settings that actually makes Tor build a circuit are set."

Reassigning to sajolida who's testing that stuff, then, so that he can confirm.

#5 Updated by intrigeri almost 6 years ago

  • Subject changed from bridge mode vs tordate timeouts to bridge mode vs. time syncing
  • Assignee changed from sajolida to anonym
  • % Done changed from 50 to 10
  • QA Check changed from Ready for QA to Dev Needed

In 0.23~rc1, time syncing in bridge mode still has quite some issues.

#6 Updated by intrigeri over 5 years ago

  • Tracker changed from Feature to Bug
  • Subject changed from bridge mode vs. time syncing to Fix time syncing in bridge mode

#7 Updated by anonym over 5 years ago

  • Assignee changed from anonym to intrigeri
  • QA Check changed from Dev Needed to Info Needed

intrigeri wrote:

In 0.23~rc1, time syncing in bridge mode still has quite some issues.

Can you clarify on how "bridge mode still has quite some issues"?

It works quite well for me in 0.23, and I've done some extensive testing with different time skews: +/- 10 years, +/- 1 year, +/- 1 month, +/- 3 days, +/- 6 hours, and no skew. Tor always manages to bootstrap and build a circuit for me. I even tested at different times of the hour to take into account the age of the consensus (since I have vague suspicions that there are corner cases related to this in general, not just in bridge mode). In total I saw around 40 successful Tails boots + Tor bootstraps and no failure.

The only issue I've found is the following, which we knew about even before 0.23-rc1: if the clock is way off (outside the picked authority's certificate lifetime), we reload Tor (i.e. send a `SIGHUP` to it). Tor Launcher isn't designed to handle this, so it'll show an error stating that "Tor failed to establish a Tor network connection". In contrast to that error, Tor will bootstrap just fine, but the error will be shown until Tor Launcher is killed by the 60-tor-ready-notification.sh hook, which definitely should be enough time for the user to read it and get confused. There's no really good fix for this in our time syncing mess. I suppose it could be made less confusing by killing Tor Launcher before reloading Tor, but that's still inconsistent with the behaviour in the other cases, and hence potentially confusing. Still "clock is way off" is a (supposedly) rare corner case, so I wonder if we should care.

Similarly, when the time syncing script restarts Tor (e.g. if the clock is not way off, but outside the consensus' validity interval), Tor Launcher's bootstrap progress meter will get stuck until it's killed by the 60-tor-ready-notification.sh hook later. While not pretty, I don't think there's much room for confusion since there will be two other indicators that Tor is working (Vidalia icon and GNOME notification).

#8 Updated by intrigeri over 5 years ago

anonym wrote:

Can you clarify on how "bridge mode still has quite some issues"?

I'm sorry I was not clearer a month ago (when I assumed this would be worked on quickly and fixed for 0.23), and I now don't remember what exactly happened.

I don't think I introduced time skew artificially to see this bug a month ago. I can't reproduce it now.

I suspect we have a weird race condition here. Perhaps try with a different number of cores, with a slower or faster Internet connection?

Worst case, well, I guess we'll ignore this for 1.0 and hope someone is able to file a more detailed bug report some day, that what I was able to provide. Sorry again.

#9 Updated by intrigeri over 5 years ago

  • Assignee changed from intrigeri to anonym
  • QA Check deleted (Info Needed)

#10 Updated by intrigeri over 5 years ago

  • QA Check set to Info Needed

#11 Updated by intrigeri over 5 years ago

anonym, can you reproduce this in the same kind of CPU-starved VM environment that you used to reproduce #5499?

#12 Updated by anonym over 5 years ago

  • Target version changed from Tails_1.0 to Tails_1.1

#13 Updated by intrigeri over 5 years ago

  • Assignee deleted (anonym)
  • Target version deleted (Tails_1.1)

#14 Updated by intrigeri over 5 years ago

  • Category set to Tor configuration
  • Parent task deleted (#5920)

#15 Updated by intrigeri over 5 years ago

(Last changes decided with my co-RM, anonym.)

#16 Updated by anonym almost 5 years ago

  • Assignee set to anonym
  • QA Check changed from Info Needed to Dev Needed

I believe I've found the issue while developing the automated bridge tests (yay!), where I also introduce time syncing tests while in bridge mode. To explain why I couldn't reproduce it, it may have been because I only tested with one bridge, but those who could see this tested with >1. Yeah... 20-date.sh is good stuff.

When the time is so wrong that tor_cert_lifetime_invalid fails, we set the time according to tor_cert_valid_after. The latter will parse Tor's log and extract the time from the last certificate lifetime runs from $DATE line it finds. If multiple bridges are given, we will see several of these lines, but only one, the last, will be picked for setting the date. If that one has a valid-after date which is before all of the other ones, we may end up with a problem: it seems Tor will not necessarily retry all of the bridges, and may only end up retrying the bridges we didn't set our date according to. In that case our system clock may be before the other bridges' certificates' valid-after, and we're stuck.

The solution is to some how wait until we've gotten the valid-after for all our bridges certificates, and then pick the highest date. I'm gonna give this a try...

#17 Updated by Tails almost 5 years ago

  • Status changed from Confirmed to In Progress

Applied in changeset commit:4b52e279833b75346009c4d8eeb3ac1ebca6442f.

#18 Updated by anonym almost 5 years ago

  • % Done changed from 10 to 30
  • Feature Branch set to bugfix/5394-time_syncing_in_bridge_mode

Pushed something that's not been tested at all. Could even be syntax errors in it.

#19 Updated by anonym almost 5 years ago

  • Assignee deleted (anonym)
  • % Done changed from 30 to 50
  • QA Check changed from Dev Needed to Ready for QA

The branch seems to fix the problem as I understand it. It helps to run time_syncing.feature on an image built from this branch (bugfix/5394-time_syncing_in_bridge_mode) with the test/6305-tor-bridges branch (not merged) checked out. To verify the error, also try running some other ISO with 3 or so branches configured in features/config/local.yml.

#20 Updated by intrigeri almost 5 years ago

  • Assignee set to intrigeri
  • Target version set to Tails_1.3

#21 Updated by intrigeri almost 5 years ago

  • Assignee changed from intrigeri to anonym
  • QA Check changed from Ready for QA to Dev Needed

Here's a quick initial code review:

  • typo in set_date_accoring_to_tor_certs
  • maybe it's worth updating the corresponding design doc?
  • nr_bridges=$(grep '^Bridge' "${TOR_RC}" | wc -l) would be more robust if the regexp had \s+ appended (needs -E option), as there are other torrc options whose name starts with "Bridge"

#22 Updated by anonym almost 5 years ago

  • Assignee changed from anonym to intrigeri
  • QA Check changed from Dev Needed to Ready for QA

intrigeri wrote:

Here's a quick initial code review:

  • typo in set_date_accoring_to_tor_certs
  • maybe it's worth updating the corresponding design doc?

Fixed.

  • nr_bridges=$(grep '^Bridge' "${TOR_RC}" | wc -l) would be more robust if the regexp had \s+ appended (needs -E option), as there are other torrc options whose name starts with "Bridge"

Fixed, but I didn't see the need for the +.

#23 Updated by intrigeri almost 5 years ago

  • % Done changed from 50 to 60

Code review now passes.

#24 Updated by intrigeri almost 5 years ago

  • Assignee changed from intrigeri to anonym
  • QA Check changed from Ready for QA to Info Needed

anonym wrote:

The branch seems to fix the problem as I understand it. It helps to run time_syncing.feature on an image built from this branch (bugfix/5394-time_syncing_in_bridge_mode) with the test/6305-tor-bridges branch (not merged) checked out.

I did that and it passed.

To verify the error, also try running some other ISO with 3 or so branches configured in features/config/local.yml.

I did that, with an ISO built from testing branch 2 days ago, and it passed too. It shouldn't, right?

#25 Updated by intrigeri almost 5 years ago

intrigeri wrote:

To verify the error, also try running some other ISO with 3 or so branches configured in features/config/local.yml.

I did that, with an ISO built from testing branch 2 days ago, and it passed too. It shouldn't, right?

Same with 1.3~rc1, so there's definitely a problem somewhere. Are there any requirements about these bridges? I've used 3 regular bridges, and didn't check if they are all working right now.

#26 Updated by anonym almost 5 years ago

  • Assignee deleted (anonym)
  • Target version deleted (Tails_1.3)
  • % Done changed from 60 to 0
  • QA Check changed from Info Needed to Dev Needed
  • Type of work changed from Code to Research

intrigeri wrote:

anonym wrote:

To verify the error, also try running some other ISO with 3 or so branches configured in features/config/local.yml.

I did that, with an ISO built from testing branch 2 days ago, and it passed too. It shouldn't, right?

Well, it seemed like there was an element of randomness to it, e.g. that we happened to pick a bridge whose certificate had the lowest valid-after date, and that Tor chose to not retry that particular bridge for whatever reason. It did happen quite often for me with a certain set of (3) bridges, though. However, when I retry the exact same thing today, I cannot reproduce it. I do see completely different valid-after dates for these bridges today, though. Hmm.

I did that, with an ISO built from testing branch 2 days ago, and it passed too. It shouldn't, right?

Same with 1.3~rc1, so there's definitely a problem somewhere.

Sadly, yes...

Are there any requirements about these bridges? I've used 3 regular bridges, and didn't check if they are all working right now.

As I understood it, you'd need >1 bridges with distinct valid-after dates on their certs. That's all, but it seems incorrect.

#27 Updated by anonym almost 5 years ago

I just managed to reproduce it with a certain set of bridges (emailed to you, intrigeri, subject "#5394 bridge set"). This was in the test suite (Scenario: Clock way in the future in bridge mode, run on the Tails 1.3 image), and I just had time to run one command (see below) before the scenario failed and the VM was killed:

$ grep "certificate lifetime" /var/log/tor/log
Jan 01 12:35:27.000 [info] (certificate lifetime runs from Nov  5 00:00:00 2014 GMT through Aug 13 23:59:59 2015 GMT. Your time is Jan 01 12:35:27 2020 UTC.)
Nov 05 00:00:00.000 [info] (certificate lifetime runs from Jan  9 00:00:00 2015 GMT through Jan  9 00:00:00 2016 GMT. Your time is Nov 05 00:00:00 2014 UTC.)
Nov 05 00:02:17.000 [info] (certificate lifetime runs from Jan  9 00:00:00 2015 GMT through Jan  9 00:00:00 2016 GMT. Your time is Nov 05 00:02:17 2014 UTC.)
Nov 05 00:02:39.000 [info] (certificate lifetime runs from Jan  9 00:00:00 2015 GMT through Jan  9 00:00:00 2016 GMT. Your time is Nov 05 00:02:39 2014 UTC.)
Nov 05 00:02:58.000 [info] (certificate lifetime runs from Jan  9 00:00:00 2015 GMT through Jan  9 00:00:00 2016 GMT. Your time is Nov 05 00:02:58 2014 UTC.)
Nov 05 00:03:18.000 [info] (certificate lifetime runs from Jan  9 00:00:00 2015 GMT through Jan  9 00:00:00 2016 GMT. Your time is Nov 05 00:03:18 2014 UTC.)
Nov 05 00:03:38.000 [info] (certificate lifetime runs from Jan  9 00:00:00 2015 GMT through Jan  9 00:00:00 2016 GMT. Your time is Nov 05 00:03:38 2014 UTC.)
Nov 05 00:03:58.000 [info] (certificate lifetime runs from Jan  9 00:00:00 2015 GMT through Jan  9 00:00:00 2016 GMT. Your time is Nov 05 00:03:58 2014 UTC.)
Nov 05 00:04:18.000 [info] (certificate lifetime runs from Jan  9 00:00:00 2015 GMT through Jan  9 00:00:00 2016 GMT. Your time is Nov 05 00:04:18 2014 UTC.)
Nov 05 00:04:38.000 [info] (certificate lifetime runs from Jan  9 00:00:00 2015 GMT through Jan  9 00:00:00 2016 GMT. Your time is Nov 05 00:04:38 2014 UTC.)
Nov 05 00:04:58.000 [info] (certificate lifetime runs from Jan  9 00:00:00 2015 GMT through Jan  9 00:00:00 2016 GMT. Your time is Nov 05 00:04:58 2014 UTC.)
Nov 05 00:05:18.000 [info] (certificate lifetime runs from Jan  9 00:00:00 2015 GMT through Jan  9 00:00:00 2016 GMT. Your time is Nov 05 00:05:18 2014 UTC.)
Nov 05 00:05:38.000 [info] (certificate lifetime runs from Jan  9 00:00:00 2015 GMT through Jan  9 00:00:00 2016 GMT. Your time is Nov 05 00:05:38 2014 UTC.)
Wed Nov  5 00:05:40 UTC 2014

Note that we set the date to "Nov 05 00:00:00.000" (in 2014), which is the cert valid-after of the first bridge in the above. After that the log only mentions another bridge with cert valid-after "Jan 9 00:00:00 2015", and the first one is never re-tried.

#28 Updated by Tails almost 5 years ago

#29 Updated by intrigeri almost 4 years ago

Apparently this branch is still merged into experimental regularly. Is it still somewhat current?

#30 Updated by intrigeri over 3 years ago

  • Related to Bug #11589: Time syncing over bridge is fragile added

#31 Updated by u over 2 years ago

  • Assignee set to anonym

Hi @anonym, could you please check how relevant this ticket still is? Has this branch been merged? Do we still need it? Thanks!

#32 Updated by intrigeri about 2 months ago

  • Assignee changed from anonym to intrigeri
  • Target version set to Tails_4.1
  • Type of work changed from Research to Code

This branch was never merged in the end. My understanding is that it's because the branch was never sent back to the reviewer after anonym provided extra info about if/how one could reproduce the problem this ticket was originally about. AFAICT this problem can still happen and it would be nice to fix it, both for our users and our test suite (#11589, which is one of the last reasons why we need +force-all-tests for every branch that affects Tor Launcher, which made me land here: our developer & reviewer experience is seriously harmed these days by the fact developers need to reason about "do I need to add +force-all-tests and sometimes forget).

I'll resurrect this branch and will try to lead this to completion. If that's enough to fix #11589, I'll call this very ticket done too, and then we'll keep our ears open to see if there are other cases when time sync'ing is broken in bridge mode for actual users (outside of our test suite).

#33 Updated by intrigeri about 1 month ago

  • Feature Branch changed from bugfix/5394-time_syncing_in_bridge_mode to bugfix/5394-time_syncing_in_bridge_mode+force-all-tests

#34 Updated by intrigeri about 1 month ago

  • Subject changed from Fix time syncing in bridge mode to Time syncing can break Tor bootstrap when using several bridges with different certificate lifespans
  • Description updated (diff)

#35 Updated by intrigeri about 1 month ago

Note that I think our test suite will not help much validating this branch: while it does configure Tails to use several bridges, they're all run by Chutney at about the same time, so there's a good chance their certificates' lifespan is about the same and we don't hit the bug that anonym identified earlier here, and that the branch aims to fix. If my original assumption is correct, then it follows that this branch is unlikely to fix #11589.

#36 Updated by intrigeri about 1 month ago

  • Assignee deleted (intrigeri)
  • Target version deleted (Tails_4.1)

I've tried to reproduce the bug described in #5394#note-16 with Tails 4.0, using:

  • 6 different bridges (2 I'm used to using + 4 ones I freshly got from BridgeDB) each time
  • a VM with a hardware clock set 24h after the current UTC time

I've started Tails 8 times in this configuration. Every time, the clock was corrected and Tor bootstrapped just fine. So I don't see how I can make progress here without looking at the certificate lifetime of each of these bridges. Either way, this problem seems unlikely to affect real-world users often ⇒ I'm tempted to classify this as a test suite problem, and to reject this ticket in favor of #11589 (which might itself be solved by #16792). Other opinions welcome wrt. what to do with this old ticket.

Anyway, given my initial motivation for working on this is moot (see previous comment), so I'm giving up.

Also available in: Atom PDF