Project

General

Profile

Bug #16471

Drop time synchronization hacks that tor 0.3.5 and 0.4.x made obsolete

Added by hefee 5 months ago. Updated 12 days ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
Time synchronization
Target version:
Start date:
02/17/2019
Due date:
% Done:

0%

Estimated time:
8.00 h
Feature Branch:
hefee/bugfix/16471-drop-time-synchronization-hacks+force-all-tests, https://salsa.debian.org/tails-team/tails/merge_requests/21
Type of work:
Research
Blueprint:
Starter:
Affected tool:

Description

And here's a list of little-t-tor changes that might be relevant, e.g. they may break or time sync hacks or instead make some of them obsolete:

Some of them were fixed in 0.3.5, some will be in 0.4.x but might be backported to 0.3.5 when we come back to this ticket. As part of this ticket, I expect we will:

  • look for breakage and fix it (subtasks of this ticket + report to Tor Network Team if the problem is on their side)
  • look for changes that make our code useless but not harmful and file new FT tickets for that (not blockers to close this ticket)
  • if it looks like 0.4.x will make a bigger difference, file a ticket about upgrading to 0.4.x and explain your findings there, so we don't do all the exact same work again once we're there

Screenshot_20190516_202417.png View - date -s 'now -60days' (50.8 KB) hefee, 05/16/2019 06:28 PM


Related issues

Related to Tails - Feature #16348: Upgrade to tor 0.3.5 Resolved 01/12/2019
Blocks Tails - Feature #16209: Core work: Foundations Team Confirmed 03/22/2019
Blocked by Tails - Feature #16687: Upgrade to tor 0.4.x Resolved
Blocks Tails - Bug #9256: Don't restart Tor after setting the right clock Confirmed 04/17/2015
Blocked by Tails - Bug #16792: Upgrade our Chutney fork and make configuration more similar to the real Tor network In Progress

Associated revisions

Revision 44d0d25c (diff)
Added by Sandro Knauß 3 months ago

Remove time sychronization hacks (refs: #16471)

With new tor 0.3.5, tor get's more moderate in terms of connection to
tor network with a clock, that is out of sync. This makes it possible to
cleanup out time sync script.

Revision 3de159c9 (diff)
Added by Sandro Knauß about 2 months ago

Remove maybe_set_time_from_tor_consensus completly. (refs: #16471)

Revision 5de2390a (diff)
Added by Sandro Knauß about 1 month ago

Remove time sychronization hacks (refs: #16471)

With new tor 0.3.5, tor get's more moderate in terms of connection to
tor network with a clock, that is out of sync. This makes it possible to
cleanup out time sync script.

Revision 433d7f1f (diff)
Added by Sandro Knauß about 1 month ago

Remove maybe_set_time_from_tor_consensus completly. (refs: #16471)

History

#1 Updated by hefee 5 months ago

#2 Updated by hefee 5 months ago

#3 Updated by hefee 5 months ago

  • Feature Branch set to hefee-bugfix-16349-tor-0.3.5-force-all-tests

As I still have no test setup I can't really check, what will break if I remove time sync hacks.

Properly we can cleanup config/chroot_local-includes/etc/NetworkManager/dispatcher.d/20-time.sh as now Tor does not fail that often anymore. Maybe the function is_clock_way_off, and unverified-microdesc-consensus may also been not needed anymore. But form the tickets in the description it is not clea, if tor now never falls with such errors or if it just not fail that often into this error.

Jenkins fails for "Time syncing ǂ Clock is one day in the future in bridge mode".

https://jenkins.tails.boum.org/job/test_Tails_ISO_hefee-bugfix-16349-tor-0.3.5-force-all-tests/lastCompletedBuild/cucumberTestReport/time-syncing/clock-is-one-day-in-the-future-in-bridge-mode/

#4 Updated by intrigeri 5 months ago

  • Subject changed from Cleanup time sync hacks. to Drop time synchronization hacks that tor 0.3.5 made obsolete (if any)
  • Category set to Time synchronization
  • Status changed from New to Confirmed

#5 Updated by intrigeri 5 months ago

#6 Updated by intrigeri 5 months ago

#7 Updated by intrigeri 5 months ago

  • Assignee deleted (intrigeri)
  • Target version changed from Tails_3.13 to Tails_3.14
  • Feature Branch deleted (hefee-bugfix-16349-tor-0.3.5-force-all-tests)
  • Type of work changed from Code to Research

(There's very little chance I tackle this during the 3.13 dev cycle, hence why I had left #16348 unassigned until last FT meeting. And there's a chance that @anonym finds this interesting enough to take it ⇒ let's discuss this at the next FT meeting :)

#8 Updated by intrigeri 3 months ago

  • Target version deleted (Tails_3.14)

#9 Updated by hefee 3 months ago

  • Assignee set to hefee

#10 Updated by Anonymous 3 months ago

  • Status changed from Confirmed to In Progress

#11 Updated by hefee 3 months ago

  • Feature Branch set to hefee/bugfix/16471-drop-time-synchronization-hacks+force-all-tests

#12 Updated by intrigeri 3 months ago

  • Subject changed from Drop time synchronization hacks that tor 0.3.5 made obsolete (if any) to Drop time synchronization hacks that tor 0.3.5 and 0.4.x made obsolete (if any)

0.4.0.5 brings:

  o Minor bugfixes (client, clock skew):
    - Bootstrap successfully even when Tor's clock is behind the clocks
      on the authorities. Fixes bug 28591; bugfix on 0.2.0.9-alpha.
    - Select guards even if the consensus has expired, as long as the
      consensus is still reasonably live. Fixes bug 24661; bugfix
      on 0.3.0.1-alpha.

#13 Updated by hefee 2 months ago

  • Assignee deleted (hefee)
  • Target version set to Tails_3.14
  • QA Check set to Ready for QA
  • Feature Branch changed from hefee/bugfix/16471-drop-time-synchronization-hacks+force-all-tests to hefee/bugfix/16471-drop-time-synchronization-hacks+force-all-tests, https://salsa.debian.org/tails-team/tails/merge_requests/21

Jenkins status looks fine. We may can even strip more stuff with 0.4.0.5.

#14 Updated by intrigeri 2 months ago

  • Assignee set to intrigeri

#15 Updated by intrigeri 2 months ago

  • Assignee changed from intrigeri to hefee
  • QA Check changed from Ready for QA to Dev Needed

Reviewed on Salsa! Once we have test coverage for the cases that this branch affects, I'm happy to merge it.

But that won't fully resolve #16471 as we still need to take benefit, in the more common cases (hardware clock is correct but in local timezone instead of UTC) from:

To be clear, ideally we would be able to get rid of maybe_set_time_from_tor_consensus() entirely: that's the part of our time sync' system that is problematic from a security PoV). I suspect we'll need 0.4.x to do that.

In any case, even if we can't remove maybe_set_time_from_tor_consensus() yet, we should adjust 20-time.sh so that we only change the system clock when it's actually needed, i.e. expose our users to a dangerous situation less often: currently we do that all the time, as long as time_is_in_valid_tor_range returns false, which nowadays includes cases when tor will have actually judged the system clock was good enough, and accepted using the downloaded consensus, even if the system clock might not be in the expected range. So I guess that when time_is_in_valid_tor_range returns false, we should check (probably several times, waiting a little bit between each try) whether Tor has actually bootstrapped. If it has, we return without setting the system clock; if it hasn't, we set the system clock to the middle of the consensus validity range, just like we've been doing so far.

#16 Updated by hefee 2 months ago

intrigeri wrote:

Reviewed on Salsa! Once we have test coverage for the cases that this branch affects, I'm happy to merge it.

As I created tests, but those are failing also for released versions 3.13.2 and 3.12.1. After investigating the issue with a manual test. I see that 20-time.sh expects the timerage for a certificate to be 3h and not 5min.
@anonym where we can change the certificate time rage for our test tor network?

Btw. the manual test was successful, so it is "only" an issue of our test network.

But that won't fully resolve #16471 as we still need to take benefit, in the more common cases (hardware clock is correct but in local timezone instead of UTC) from:

To be clear, ideally we would be able to get rid of maybe_set_time_from_tor_consensus() entirely: that's the part of our time sync' system that is problematic from a security PoV). I suspect we'll need 0.4.x to do that.

My tests show, that we can't get rid of it completely atm. I removed the maybe_set_time_from_tor_consensus() and end up with the attached screenshot. I'm unsure if the mentioned tickets will help to get rid of it completely as they "only extend" the range of then a consensus is accepted. But if we have completely broken clock, that is reset to timestamp=0 we still need maybe_set_time_from_tor_consensus(). But we will se, if we have a working test for it, we can investigate further.

In any case, even if we can't remove maybe_set_time_from_tor_consensus() yet, we should adjust 20-time.sh so that we only change the system clock when it's actually needed, i.e. expose our users to a dangerous situation less often: currently we do that all the time, as long as time_is_in_valid_tor_range returns false, which nowadays includes cases when tor will have actually judged the system clock was good enough, and accepted using the downloaded consensus, even if the system clock might not be in the expected range. So I guess that when time_is_in_valid_tor_range returns false, we should check (probably several times, waiting a little bit between each try) whether Tor has actually bootstrapped. If it has, we return without setting the system clock; if it hasn't, we set the system clock to the middle of the consensus validity range, just like we've been doing so far.

That sounds like a very reasonable approach, but this would make the script even longer not shorter and this ticket is about removing code ;D

What I don't understand, why you think, that the date -s "{vmid}", is that dangerous. A user will need to wait till htpdate updated the clock anyways, before they are able to access the internet. Or is this about, that it makes Tails unique behavior visible withing the tor network? And that it makes everything more slow?

#17 Updated by hefee 2 months ago

@intrigeri - I spent 2h for this ticket - maybe another 2h to getting the tests working, some manual testing,...

#18 Updated by anonym 2 months ago

hefee wrote:

As I created tests, but those are failing also for released versions 3.13.2 and 3.12.1. After investigating the issue with a manual test. I see that 20-time.sh expects the timerage for a certificate to be 3h and not 5min.
@anonym where we can change the certificate time rage for our test tor network?

nickm?
You'll have to mess with the values in submodules/chutney/torrc_templates/authority.i. The tor man page is your friend. The little-t-tor people (perhaps Nick) can probably help.

#19 Updated by intrigeri 2 months ago

  • Assignee changed from intrigeri to hefee

anonym where we can change the certificate time rage for our test tor network?

You've got a reply so next step here is to use this info and make the tests pass.

Btw. the manual test was successful, so it is "only" an issue of our test network.

Great!

That sounds like a very reasonable approach, but this would make the script even longer not shorter and this ticket is about removing code ;D

Sorry I've been unclear: the goal here is to remove dangerous behavior. We can hope it allows us to remove code at the end of the day (once we have Tor 0.4.x) but it's OK if it actually adds code, if it protects our users better :)

What I don't understand, why you think, that the date -s "{vmid}", is that dangerous. A user will need to wait till htpdate updated the clock anyways, before they are able to access the internet. Or is this about, that it makes Tails unique behavior visible withing the tor network? And that it makes everything more slow?

It essentially disables Tor's check for consensus freshness, which is there for a reason. Sure, htpdate might fix this after the fact, but htpdate can be run while tor is using a replayed (obsolete) consensus. The consequences are not well understood but:

  • At the very least, indeed that behavior is unique to Tails.
  • It's been pointed out by Tor developers that our security analysis is incorrect. I can't find the corresponding discussion anymore. We never took time to check this properly, because for years we've been operating under the assumption that #5774 would be solved soon.

#20 Updated by intrigeri 2 months ago

  • Related to Bug #9256: Don't restart Tor after setting the right clock added

#21 Updated by intrigeri 2 months ago

  • Assignee changed from hefee to intrigeri
  • Estimated time set to 8.00 h

#22 Updated by intrigeri 2 months ago

  • Assignee changed from intrigeri to hefee

#23 Updated by intrigeri 2 months ago

You'll have to mess with the values in submodules/chutney/torrc_templates/authority.i. The tor man page is your friend. The little-t-tor people (perhaps Nick) can probably help.

hefee, I'm happy to help decipher this before you ask Tor folks, if you have a hard time with it.

#24 Updated by intrigeri 2 months ago

Also, note that we have a feature/tor-nightly-master branch that can be used as a basis to try stuff based on a more recent tor. If it turns out that tor master allows us to get rid of maybe_set_time_from_tor_consensus() entirely, we can skip the intermediary solution I suggested earlier here.

#25 Updated by intrigeri 2 months ago

Taking another step back, given this likely won't make it into 3.14, and there's a good chance 3.15 (or worst case 3.16) ships with tor 0.4.x, I'm starting to think our time would be better spend doing this directly with 0.4.x, using https://deb.torproject.org/torproject.org/dists/tor-experimental-0.4.0.x-stretch/ for now. A branch for #16687 could upgrade to that version of tor (that will already teach us something) and the branch for this ticket would be based on it. In other words: I recommend you forget about 0.3.5 and focus on 0.4.x. What do you think?

#26 Updated by anonym about 2 months ago

intrigeri wrote:

You'll have to mess with the values in submodules/chutney/torrc_templates/authority.i. The tor man page is your friend. The little-t-tor people (perhaps Nick) can probably help.

hefee, I'm happy to help decipher this before you ask Tor folks, if you have a hard time with it.

Me too!

#27 Updated by hefee about 2 months ago

intrigeri wrote:

You'll have to mess with the values in submodules/chutney/torrc_templates/authority.i. The tor man page is your friend. The little-t-tor people (perhaps Nick) can probably help.

hefee, I'm happy to help decipher this before you ask Tor folks, if you have a hard time with it.

I found the solution to make the validity check in 20time.sh pass, also for 3.13.2. But I'm unsure how I can propose an update, as it is in chutney submodule and I do not have write permissions.

@intrigeri: should I extract the additional tests + chutney updates for stable branch? As those tests should pass also for current branch?

#28 Updated by hefee about 2 months ago

#29 Updated by hefee about 2 months ago

  • Related to deleted (Bug #9256: Don't restart Tor after setting the right clock)

#30 Updated by hefee about 2 months ago

  • Blocks Bug #9256: Don't restart Tor after setting the right clock added

#31 Updated by hefee about 2 months ago

intrigeri wrote:

Taking another step back, given this likely won't make it into 3.14, and there's a good chance 3.15 (or worst case 3.16) ships with tor 0.4.x, I'm starting to think our time would be better spend doing this directly with 0.4.x, using https://deb.torproject.org/torproject.org/dists/tor-experimental-0.4.0.x-stretch/ for now. A branch for #16687 could upgrade to that version of tor (that will already teach us something) and the branch for this ticket would be based on it. In other words: I recommend you forget about 0.3.5 and focus on 0.4.x. What do you think?

I use now feature/tor-nightly-master and whoo, we can get rid of maybe_set_time_from_tor_consensus() entirely. As we have now tests to set the system clock +40days and -10 years and they pass. I updated blocks/blocked by relations. To make it visible, that this is blocked by tor 4.x in tails.

#32 Updated by hefee about 2 months ago

  • Assignee changed from hefee to intrigeri

#33 Updated by intrigeri about 2 months ago

  • Target version changed from Tails_3.14 to Tails_3.15

#34 Updated by intrigeri about 2 months ago

  • QA Check changed from Dev Needed to Ready for QA

@hefee, I assume you meant to set this "Ready for QA"; otherwise, please clarify what dev you expect me to do :)

#35 Updated by hefee about 2 months ago

intrigeri wrote:

@hefee, I assume you meant to set this "Ready for QA"; otherwise, please clarify what dev you expect me to do :)

yes "ready for QA" is the correct status, forgotten to update the status.

#36 Updated by intrigeri about 2 months ago

  • Assignee changed from intrigeri to hefee
  • QA Check changed from Ready for QA to Dev Needed

Great job! See review on Salsa :)

#37 Updated by intrigeri about 2 months ago

  • Status changed from In Progress to Needs Validation
  • Assignee changed from hefee to intrigeri

(The MR was reassigned to me.)

#38 Updated by intrigeri about 1 month ago

  • Status changed from Needs Validation to In Progress
  • Assignee changed from intrigeri to hefee

#39 Updated by intrigeri about 1 month ago

FWIW I've pushed bugfix/16471-drop-time-synchronization-hacks+force-all-tests that differs from your branch in two ways:

  • I've rebased it onto the branch for #16687 (itself based on stable), so that this can indeed be a candidate for 3.15. I think you should do the same, i.e. hard reset your branch to this one, modulo perhaps:
  • I've reverted the Chutney change, in order to see what happens in the test suite without it: as explained on the corresponding MR, I still don't understand why we still need these changes with the code updates you're proposing for tails.git. I'm curious, we'll see what CI thinks :)

#40 Updated by hefee about 1 month ago

@intrigeri I think we should schedule a short meeting about issue to clarify things about this issue, what we should concentrate on etc. and what new issues we may need to create. As it seems for me, we are not on the same page and this ping-pong over comments on salsa takes for ever and I starting to get frustrated about the progress. That's why I want to try, if a jabber discussion may help to solve this.

#41 Updated by intrigeri about 1 month ago

intrigeri wrote:

FWIW I've pushed bugfix/16471-drop-time-synchronization-hacks+force-all-tests

Unsurprisingly, the tests fail. This made me notice #16793. So I've merged my branch for #16793 into bugfix/16471-drop-time-synchronization-hacks+force-all-tests, in order to see what fails without the Chutney changes :)

#42 Updated by intrigeri about 1 month ago

hefee wrote:

@intrigeri I think we should schedule a short meeting about issue to clarify things about this issue, what we should concentrate on etc. and what new issues we may need to create. As it seems for me, we are not on the same page and this ping-pong over comments on salsa takes for ever and I starting to get frustrated about the progress. That's why I want to try, if a jabber discussion may help to solve this.

I totally agree in principle but my availability this month makes it very hard to schedule (more) meetings. Worst case, let's discuss this during the upcoming Buster sprint?

#43 Updated by intrigeri about 1 month ago

  • Blocked by Bug #16792: Upgrade our Chutney fork and make configuration more similar to the real Tor network added

#44 Updated by intrigeri about 1 month ago

After a meeting between hefee, anonym and I, the plan is:

  1. do manual tests on bugfix/16471-drop-time-synchronization-hacks+force-all-tests (see https://salsa.debian.org/tails-team/tails/merge_requests/21#note_90775 where I documented how to do such tests), for +/- 12h, 24h, 48h, 1 week, 1 month, 1 year (stop when it starts failing) [intrigeri]
  2. automatic tests for the same scenarios using Chutney (we have that already): https://jenkins.tails.boum.org/view/Tails_ISO/job/test_Tails_ISO_bugfix-16471-drop-time-synchronization-hacks-force-all-tests/
  3. refresh our chutney sources (#16792) + update network configuration to be more similar to the real Tor network [anonym]
  4. compare (1) and (2) and (3) so we can finally have a good idea of whether we can rely on Chutney for these things [anonym]
  5. from (1), infer what tor currently allows in terms of clock skew
  6. ask tor devs to confirm our findings from (5)
  7. schedule a meeting where we'll discuss our findings and will try to reach a conclusion wrt. what part of the "set clock from tor consensus" dirty hack we can remove.

#45 Updated by intrigeri about 1 month ago

  • Assignee changed from hefee to intrigeri

(Next step is on my plate.)

#46 Updated by intrigeri about 1 month ago

  • Assignee changed from intrigeri to anonym

intrigeri wrote:

  1. do manual tests on bugfix/16471-drop-time-synchronization-hacks+force-all-tests (see https://salsa.debian.org/tails-team/tails/merge_requests/21#note_90775 where I documented how to do such tests), for +/- 12h, 24h, 48h, 1 week, 1 month, 1 year (stop when it starts failing) [intrigeri]

Test results with an ISO built at b13a5f18cb, that has tor 0.4.0.5-1~d90.stretch+1 installed (OK means that tor bootstraps successfully, FAIL means that tor fails to b:

  • direct connection (no bridge):
    • -12h: OK
    • +12h: OK
    • -24h: OK
    • +24h: OK
    • -48h: FAIL
    • +48h: FAIL
  • using a bridge:
    • -12h: initially FAIL (Tor Launcher displays an error about clock skew), but OK after retrying config + connect in Tor Launcher → bug in Tor, in Tor Launcher, or in their interaction?
    • +12h: OK
    • -24h: initially FAIL (Tor Launcher displays an error about clock skew), but OK after retrying config + connect in Tor Launcher → bug in Tor, in Tor Launcher, or in their interaction?
    • +24h: OK
    • -48h: FAIL
    • +48h: FAIL

Apart of the weird behavior when using bridges on a client whose clock is in the past, which should be investigated and reported upstream, at first glance this essentially means that:

  • We don't need to set the clock according to the tor consensus (dangerous) anymore, for clocks that are a bit off (up to 24h is fine, maybe a bit more); this is great as it covers the "clock is set to local time, not to UTC" big problem, i.e. anyone with an accurate hardware clock, regardless of what timezone it's in, will not be exposed to dangerous code paths anymore. Except it's more complicated, see below.
  • As expected, tor 0.4.0.x does not magically fix clocks that are very wrong. So until we do #5774 (i.e. ask the user what time it is), we have to either drop support for these systems (I'm not looking forward to debating this) or to keep setting time according to whatever tor consensus we receive.

Problem is, as long as we do trust time from tor consensus in some cases, we still expose all users to replay attacks: we can't differentiate between "our clock is correct but a 1-week consensus is being replayed to us" and "the clock is one week in the future". But at least we could:

  • Avoid triggering the dangerous code path when the clock appears to be in the future: I assume that most hardware clocks that are very wrong (i.e. more than due to "hardware clock is set to local time", which won't prevent tor from bootstrapping anymore) are in the past. This should protect against consensus replay attacks.
  • Ensure we don't trigger the dangerous code path in cases when we don't need to, i.e. only trigger it when tor indeed fails to bootstrap due to clock issues. I believe tor now exposes this kind of info on the control port (so that Tor Launcher can tell the user something hopefully useful).

Reassigning to anonym, because the next step is on his plate.

#47 Updated by intrigeri 13 days ago

  • Target version changed from Tails_3.15 to Tails_3.16

I doubt we'll finish all this by July 9 given #16792 is not done yet => postponing :)

#48 Updated by intrigeri 12 days ago

  • Subject changed from Drop time synchronization hacks that tor 0.3.5 and 0.4.x made obsolete (if any) to Drop time synchronization hacks that tor 0.3.5 and 0.4.x made obsolete

Also available in: Atom PDF