Project

General

Profile

Bug #16792

Upgrade our Chutney fork

Added by intrigeri 3 months ago. Updated about 4 hours ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
Test suite
Target version:
Start date:
Due date:
% Done:

20%

Feature Branch:
feature/16792-only-update-chutney+force-all-tests
Type of work:
Code
Blueprint:
Starter:
Affected tool:

Description

We haven't updated our Chutney fork since 2016. Current diffstat is: 139 files changed, 3401 insertions(+), 712 deletions(-).
Using configuration and code that was developed for a somewhat antique version of tor will likely cause trouble at some point.

I've noticed this because our suite sets deprecated options in torrc for Chutney usage:

Jun 10 19:38:59 amnesia tor[8575]: Jun 10 19:38:59.126 [warn] The configuration option 'TestingBridgeDownloadSchedule' is deprecated; use 'TestingBridgeDownloadInitialDelay' instead.
Jun 10 19:38:59 amnesia tor[8575]: Jun 10 19:38:59.126 [warn] The configuration option 'TestingClientConsensusDownloadSchedule' is deprecated; use 'TestingClientConsensusDownloadInitialDelay' instead.
Jun 10 19:38:59 amnesia tor[8575]: Jun 10 19:38:59.126 [warn] The configuration option 'TestingClientDownloadSchedule' is deprecated; use 'TestingClientDownloadInitialDelay' instead.

Upstream Chutney has removed all DownloadSchedule torrc options from these templates so in this case, it's no big deal.


Related issues

Blocks Tails - Feature #16209: Core work: Foundations Team Confirmed
Blocks Tails - Feature #15211: Reduce our Chutney network In Progress 01/22/2018

Associated revisions

Revision 3ef18965 (diff)
Added by anonym 3 months ago

Chutney: update to current upstream master (Will-fix: #16792).

Revision c09577e4 (diff)
Added by anonym 3 months ago

Chutney: use normal consensus cycle interval (Will-fix: #16792).

Revision 068dcb91 (diff)
Added by anonym about 1 month ago

Revert "Chutney: use normal consensus cycle interval (Will-fix: #16792)."

This reverts commit c09577e4fd9ee8f0cfe12aea3dc5b072e556bcda.

Revision b5439e0c (diff)
Added by anonym about 1 month ago

Chutney: use normal consensus cycle interval (Will-fix: #16792).

Revision b4109f93 (diff)
Added by anonym 14 days ago

Chutney: update to current upstream master (Will-fix: #16792).

Revision 00da865a (diff)
Added by anonym 14 days ago

Chutney: use normal consensus cycle interval (Will-fix: #16792).

Revision 3addffea (diff)
Added by anonym 14 days ago

Chutney: update to current upstream master (Will-fix: #16792).

Revision ae6e0753 (diff)
Added by anonym 14 days ago

Chutney: use normal consensus cycle interval (Will-fix: #16792).

Revision 8bb166dd (diff)
Added by intrigeri 4 days ago

Test suite: fix Chutney exit relays tagging (refs: #16792)

In all Chutney's networks/*, exit relays are tagged "r" (AFAICT only the first
letter of the tag is taken into account, so "relay" should be an equivalent).
No tag="e" or tag="exit" can be found in Chutney's Git history so I believe that
our usage of tag="exit" is incorrect. I don't know what's the practical impact
of using an unsupported tag, though.

Revision 73e5ae37 (diff)
Added by intrigeri 4 days ago

Test suite: fix Chutney bridge authorities tagging (refs: #16792)

In all Chutney's networks/*, bridge authorities are tagged "ba".
Let's do the same to avoid having to wonder whether using anything
else can cause trouble.

Revision b5ba531c (diff)
Added by intrigeri 4 days ago

Test suite: set client=1 for the Chutney clients (refs: #16792)

This was added to Chutney's networks/*. Let's do the same to avoid having to
wonder whether not doing so can cause trouble.

Revision 79f86a31 (diff)
Added by intrigeri 4 days ago

WIP: Test suite: save Tor and Chutney logs on any scenario failure (refs: #16792)

This commit should not be merged into our base branches.

Goal: investigate the "nc: connection failed, SOCKSv5 error: General SOCKS
server failure" problem we see with SSH since we updated Chutney.

Revision 70a1712c (diff)
Added by intrigeri 1 day ago

Test suite: wait for all Chutney nodes to have bootstrapped before assuming the simulated Tor network is ready.

refs: #16792

Revision 05ce3ed4 (diff)
Added by intrigeri about 9 hours ago

Revert "WIP: Test suite: save Tor and Chutney logs on any scenario failure (refs: #16792)"

This reverts commit 79f86a310a3568a5ef2f4646b216b1be1fab7144.

I've seen failures to archive build artifacts on Jenkins since
I've made this change, which makes it harder to investigate
issues, while this change was meant to make this easier.

Revision 4e326584 (diff)
Added by intrigeri about 5 hours ago

Revert "WIP: Test suite: save Tor and Chutney logs on any scenario failure (refs: #16792)"

This reverts commit 79f86a310a3568a5ef2f4646b216b1be1fab7144, which
causes failures to save artifacts:

Then I see that SSH is properly stream isolated
  1. features/step_definitions/tor.rb:289
    cannot handle socket (RuntimeError)
    /usr/lib/ruby/2.3.0/fileutils.rb:1379:in `copy'
    /usr/lib/ruby/2.3.0/fileutils.rb:472:in `block in copy_entry'
    /usr/lib/ruby/2.3.0/fileutils.rb:1498:in `wrap_traverse'
    /usr/lib/ruby/2.3.0/fileutils.rb:1501:in `block in wrap_traverse'
    /usr/lib/ruby/2.3.0/fileutils.rb:1500:in `each'
    /usr/lib/ruby/2.3.0/fileutils.rb:1500:in `wrap_traverse'
    /usr/lib/ruby/2.3.0/fileutils.rb:1501:in `block in wrap_traverse'
    /usr/lib/ruby/2.3.0/fileutils.rb:1500:in `each'
    /usr/lib/ruby/2.3.0/fileutils.rb:1500:in `wrap_traverse'
    /usr/lib/ruby/2.3.0/fileutils.rb:1501:in `block in wrap_traverse'
    /usr/lib/ruby/2.3.0/fileutils.rb:1500:in `each'
    /usr/lib/ruby/2.3.0/fileutils.rb:1500:in `wrap_traverse'
    /usr/lib/ruby/2.3.0/fileutils.rb:469:in `After'
    Scenario failed at time 03:39:50

Revision 4b35d25b (diff)
Added by intrigeri about 5 hours ago

Test suite: only save Chutney logs on failure (refs: #16792)

commit 79f86a310a3568a5ef2f4646b216b1be1fab7144 causes failures to save
artifacts because log.tor does not always exist.

History

#1 Updated by intrigeri 3 months ago

#2 Updated by intrigeri 3 months ago

  • Blocks Bug #16471: Drop time synchronization hacks that tor 0.3.5 and 0.4.x made obsolete added

#3 Updated by anonym 3 months ago

  • Assignee set to anonym

#4 Updated by anonym 3 months ago

  • Subject changed from Upgrade our Chutney fork to Upgrade our Chutney fork and make configuratoin more similar to the real Tor network

#5 Updated by hefee 3 months ago

Here is another patch for chutney to be more like the current Tor network, to let tests passes that do time screws.
https://salsa.debian.org/hefee/chutney/merge_requests/1

#6 Updated by anonym 3 months ago

  • Subject changed from Upgrade our Chutney fork and make configuratoin more similar to the real Tor network to Upgrade our Chutney fork and make configuration more similar to the real Tor network

#7 Updated by anonym 3 months ago

  • Status changed from Confirmed to In Progress

#8 Updated by anonym 3 months ago

  • % Done changed from 0 to 10
  • Feature Branch set to feature/16792-update-chutney+force-all-tests

The current branch updates chutney, and I've at least verified that tor bootstraps. I saw some initial failures with htpdate (restarting the services fixed it), not sure why is that or if it was a transient thing.

Any way, let's see what a full run looks like.

#9 Updated by anonym 3 months ago

  • % Done changed from 10 to 20

The full test runs look like any other +force-all-tets run, so I see no apparent regressions, yay!

Next step, make the configuration more similar to the real Tor network.

#10 Updated by anonym 3 months ago

We used to get consensuses looking like this:

valid-after 2019-06-26 10:12:00
fresh-until 2019-06-26 10:15:00
valid-until 2019-06-26 10:18:00
voting-delay 4 4

i.e. really short intervals/delays. In the real Tor network it looks like this:
valid-after 2019-06-26 09:00:00
fresh-until 2019-06-26 10:00:00
valid-until 2019-06-26 12:00:00
voting-delay 300 300

I've adjusted the *V3Auth* settings so they now have the same length intervals etc as the real Tor network:
 
valid-after 2019-06-26 11:31:00
fresh-until 2019-06-26 12:31:00
valid-until 2019-06-26 14:31:00
voting-delay 300 300

That should be all we need for #16471, but there are other discrepancies too (e.g. the real Tor network use consensus-method 28 but we get consensus-method 25 for some reason, but this is just an example -- I don't think we care about this one).

#11 Updated by intrigeri about 1 month ago

  • Target version set to Tails_3.16

anonym, could finishing this be your next focus point once you're done with 4.0~beta1? Or maybe you prefer someone else to finish this work?

In any case, I've refreshed the branch so that it builds on Jenkins again.

Getting this done would allow us to make progress on #16471 :)

Setting target version to the same as #16471, for consistency. But postponing both would be fine.

#12 Updated by intrigeri about 1 month ago

#13 Updated by anonym about 1 month ago

intrigeri wrote:

anonym, could finishing this be your next focus point once you're done with 4.0~beta1?

Yes!

In any case, I've refreshed the branch so that it builds on Jenkins again.

Thanks! Sadly we lost all results except last week's, so now we have have much less data to rely on. I guess there is nothing we can do about this? Or could they still exist on some backup that isn't too awkward (say at most 15 minutes of work) to restore from?

After looking briefly at Jenkins runs 1-5 and comparing them devel's results from the same time frame, I see mostly the same network using branches failing, but not similar enough that it's a no-brainer to tell if things are worse/same/better. So I feel tempted to spam test this branch a bit on Jenkins. I started 3 runs now, and hope to start a few more during the weekend (feel free to help me if you happen to see some Jenkins slaves idling!), and hopefully have more convincing data on Monday. Then there's two weeks to get this and #16471 ready for Tails 3.16, which seems doable.

#14 Updated by intrigeri about 1 month ago

Yo,

anonym wrote:

intrigeri wrote:

anonym, could finishing this be your next focus point once you're done with 4.0~beta1?

Yes!

Excellent news :)

Thanks! Sadly we lost all results except last week's, so now we have have much less data to rely on. I guess there is nothing we can do about this? Or could they still exist on some backup that isn't too awkward (say at most 15 minutes of work) to restore from?

Nope: we don't backup the output of Jenkins jobs (that would take huge amounts of disk, bandwidth, CPU, and degrade performance of all our services).

After looking briefly at Jenkins runs 1-5 and comparing them devel's results from the same time frame, I see mostly the same network using branches failing, but not similar enough that it's a no-brainer to tell if things are worse/same/better. So I feel tempted to spam test this branch a bit on Jenkins. I started 3 runs now, and hope to start a few more during the weekend (feel free to help me if you happen to see some Jenkins slaves idling!), and hopefully have more convincing data on Monday. Then there's two weeks to get this and #16471 ready for Tails 3.16, which seems doable.

Sounds like a great plan! Let's take it easy wrt. spamming Jenkins: these days it happens regularly that it has a large backlog (I've seen the feedback loop take 20+ hours a few times recently).

I'll run a few jobs on my local Jenkins too, if its Jenkins workers and myself both have some spare time for it.

#15 Updated by intrigeri about 1 month ago

comparing them devel's results from the same time frame

Note that comparing with the base branch of this one (stable) instead would probably be a tiny bit more accurate :)

#16 Updated by intrigeri about 1 month ago

I'll run a few jobs on my local Jenkins too, if its Jenkins workers and myself both have some spare time for it.

We both had! So I should be able to report about 12+ full local runs tomorrow :)

#17 Updated by intrigeri about 1 month ago

We both had! So I should be able to report about 12+ full local runs tomorrow :)

Here we go. I got 12 full runs on my local Jenkins.

Apart of the usual suspects (mostly #11592 these days, see my comment there), I see:

  • lots of TimeSyncingError, in the time sync and bridges scenarios, including in "Clock with host's time"
  • some TorBootstrapFailure, in the time sync scenarios, including in "Clock with host's time"
  • lots of "available upgrades have been checked" failures after "Tor is ready" succeeded, in the bridges scenarios
  • one "Tails clock is less than 5 minutes incorrect" failure in "Clock with host's time": clock was off by 435 seconds; interestingly, the clock was correct before htpdate "fixed" it, so I dunno what the heck is going on here
  • lots of GnuPG key fetching failures; that might be caused by instability of my Internet connection but I have a doubt since I've not seen this fail as much while running tests for other branches recently; or it might be that pool.sks-keyservers.net was having trouble at that point, which is entirely possible
  • some weird failures for SSH and Git tests

All in all, that's many more failures than I see usually in the same environment (that tends to expose less robustness issues than our shared Jenkins).

Once you've analysed the runs on our shared Jenkins, if there's important problems I'm seeing that you did not see, and if you think it would help, I could share with you a tarball of the debug logs, tor logs, and Journal dumps.

#18 Updated by anonym about 1 month ago

intrigeri wrote:

  • lots of TimeSyncingError, in the time sync and bridges scenarios, including in "Clock with host's time"

Same.

  • some TorBootstrapFailure, in the time sync scenarios, including in "Clock with host's time"

I saw this once, for "Clock with host's time".

  • lots of "available upgrades have been checked" failures after "Tor is ready" succeeded, in the bridges scenarios

None of these. I find it odd that you see "lots" of these while I see none, but I suppose it could be your internet connection? IIRC these tests are especially sensitive to that?

  • one "Tails clock is less than 5 minutes incorrect" failure in "Clock with host's time": clock was off by 435 seconds; interestingly, the clock was correct before htpdate "fixed" it, so I dunno what the heck is going on here

Huh. Will keep an eye open for htpdate strangeness.

  • lots of GnuPG key fetching failures; that might be caused by instability of my Internet connection but I have a doubt since I've not seen this fail as much while running tests for other branches recently; or it might be that pool.sks-keyservers.net was having trouble at that point, which is entirely possible
  • some weird failures for SSH and Git tests

Same for these.

All in all, that's many more failures than I see usually in the same environment (that tends to expose less robustness issues than our shared Jenkins).

On jenkins there's also noticeably more failures on the feature branch compared to stable. However, it seems that most failures are introduced in a few runs only, with the other runs looking pretty much like stable. Hmm. One could almost suspect that Chutney generated a "buggy" Tor network for those bad runs.

Let's note that this branch added two things:
  • Chutney: use normal consensus cycle interval
  • Chutney: update to current upstream master
We should test these separately and see which one causes all the trouble, so I created these poorly named branches:
  • feature/16792-only-update-chutney+force-all-tests
  • feature/16792-normal-consensus+force-all-tests

It looks increasingly unlikely we'll have this done for Tails 3.16. Or what do you think, @intrigeri?

#19 Updated by intrigeri 29 days ago

Hi,

  • lots of "available upgrades have been checked" failures after "Tor is ready" succeeded, in the bridges scenarios

None of these. I find it odd that you see "lots" of these while I see none, but I suppose it could be your internet connection? IIRC these tests are especially sensitive to that?

I'm not used to see lots of those here but it could indeed be my Internet connection. If you feel it would be useful, I could trigger a few more runs to see if this happens often again.

Let's note that this branch added two things: […]
We should test these separately and see which one causes all the trouble

Excellent idea :)

It looks increasingly unlikely we'll have this done for Tails 3.16.

I agree it looks less easy than we could have been hoping previously, but there's still ~10 days left, which is plenty. We'll see! If it makes it into 3.16, great, and it gets merged a bit later, fine.

#20 Updated by anonym 29 days ago

intrigeri wrote:

  • lots of "available upgrades have been checked" failures after "Tor is ready" succeeded, in the bridges scenarios

None of these. I find it odd that you see "lots" of these while I see none, but I suppose it could be your internet connection? IIRC these tests are especially sensitive to that?

I'm not used to see lots of those here but it could indeed be my Internet connection. If you feel it would be useful, I could trigger a few more runs to see if this happens often again.

Yes, it would be very useful ...

Let's note that this branch added two things: […]
We should test these separately and see which one causes all the trouble

Excellent idea :)

... if you could do a few runs of the two new branches!

#21 Updated by intrigeri 29 days ago

Yes, it would be very useful ...
[…]
... if you could do a few runs of the two new branches!

OK, I shall do this once I'm done with #12092, which currently keeps my local Jenkins busy.

#22 Updated by intrigeri 20 days ago

  • Target version changed from Tails_3.16 to Tails_3.17

intrigeri wrote:

Yes, it would be very useful ...
[…]
... if you could do a few runs of the two new branches!

OK, I shall do this once I'm done with #12092, which currently keeps my local Jenkins busy.

Unfortunately, this has been the case almost continuously since I wrote that, so I could not run anything on the new branches yet.

#23 Updated by intrigeri 18 days ago

I've started 3 runs of each of these 2 new branches locally. And then I noticed that the 2 branches are based on stable. This of course made sense back when they were prepared: we were targeting 3.16. But since then, lots of test suite robustness improvements went to devel and not to stable, so it'll be more painful than it could be to analyze the results. So if you don't mind, I'll merge devel into these branches.

#24 Updated by intrigeri 14 days ago

  • Assignee deleted (anonym)

#25 Updated by intrigeri 14 days ago

  • Target version changed from Tails_3.17 to Tails_4.0

intrigeri wrote:

So if you don't mind, I'll merge devel into these branches.

I've rebased the 3 branches on top of current devel.

#26 Updated by intrigeri 12 days ago

  • Assignee set to intrigeri

I'll take care of analyzing the test results, in the hope that we can at least merge the "upgrade Chutney" part soonish :)

#27 Updated by intrigeri 12 days ago

I feel I need much more data to draw any conclusion. I'll update the info in this comment in a week or so.

#28 Updated by intrigeri 4 days ago

One week later, I've updated my analysis of test results (see below) and I now feel I have enough data to draw preliminary conclusions:

  • In the current state of things, none of the 2 branches is ready for merging. Both have issues that need further investigation.
  • Regarding the "only update Chutney" branch: I'll come back to it in a week or so to see if the tweaks that I've just pushed help.
  • I'm starting to suspect that we're starting to use the simulated tor network before it's fully ready: in Chutney nodes' logs I see quite often bootstrap status that's below 100%. It seems we should use chutney wait_for_bootstrap in our ensure_chutney_is_running function: see the "Waiting for the network" section in Chutney's README for details.

Test results:

  • only update Chutney
    • runs 18-30 of https://jenkins.tails.boum.org/view/Tails_ISO/job/test_Tails_ISO_feature-16792-only-update-chutney-force-all-tests/:
      • "Cloning a Git repository ǂ Cloning git repository over SSH" failed 8/13 times, compared to 0/21 times on devel. git clone immediately fails with "nc: connection failed, SOCKSv5 error: General SOCKS server failure".
      • "Tor stream isolation is effective ǂ SSH is using the default SocksPort" failed 10/13 times, compared to 3/21 times on devel. This looks like #17013 but the increased failure rate is concerning. Just like in the Git over SSH scenario, SSH immediately fails with "nc: connection failed, SOCKSv5 error: General SOCKS server failure".
      • Apart of these 2 concerning issues, it's business as usual, no regression spotted there.
    • runs 4-7 on my local Jenkins:
      • 3/4 fully passed
      • "Cloning a Git repository ǂ Cloning git repository over SSH" failed 1/4 times, in exactly the same way as on lizard
      • "Tor stream isolation is effective ǂ SSH is using the default SocksPort" failed 1/4 times, in exactly the same way as on lizard
    • Wrt. "nc: connection failed, SOCKSv5 error: General SOCKS server failure", since then I've applied a few tweaks that might help (unlikely) and added 79f86a310a3568a5ef2f4646b216b1be1fab7144 to get more debugging info.
  • normal consensus

Wrt. the aforementioned Tor bootstrap failures, tor got a valid consensus:

Sep 07 10:25:51 amnesia time[8106]: Waiting for a Tor consensus file to contain a valid time interval
Sep 07 10:25:51 amnesia time[8109]: A Tor consensus file now contains a valid time interval.
Sep 07 10:25:51 amnesia time[8110]: Waiting for the chosen Tor consensus file to contain a valid time interval...
Sep 07 10:25:51 amnesia time[8112]: The chosen Tor consensus now contains a valid time interval, let's use it.
Sep 07 10:25:51 amnesia time[8116]: Tor: valid-after=2019-09-07 10:13:00 | valid-until=2019-09-07 13:13:00
Sep 07 10:25:51 amnesia time[8119]: Current time is 2019-09-07 10:25:51
Sep 07 10:25:51 amnesia time[8124]: Current time is in valid Tor range
Sep 07 10:25:51 amnesia time[8125]: Waiting for Tor to be working...

… it can't build a circuit:

Sep 07 10:25:50.000 [notice] Bootstrapped 0% (starting): Starting
Sep 07 10:25:50.000 [notice] Starting with guard context "default" 
Sep 07 10:25:50.000 [notice] Signaled readiness to systemd
Sep 07 10:25:50.000 [notice] New control connection opened from 127.0.0.1.
Sep 07 10:25:51.000 [notice] New control connection opened from 127.0.0.1.
Sep 07 10:25:51.000 [notice] Opening Control listener on /run/tor/control
Sep 07 10:25:51.000 [notice] Opened Control listener on /run/tor/control
Sep 07 10:25:51.000 [notice] Bootstrapped 5% (conn): Connecting to a relay
Sep 07 10:25:51.000 [notice] Bootstrapped 10% (conn_done): Connected to a relay
Sep 07 10:25:51.000 [notice] Bootstrapped 14% (handshake): Handshaking with a relay
Sep 07 10:25:51.000 [notice] Bootstrapped 15% (handshake_done): Handshake with a relay done
Sep 07 10:25:51.000 [notice] Bootstrapped 20% (onehop_create): Establishing an encrypted directory connection
Sep 07 10:25:51.000 [notice] Bootstrapped 25% (requesting_status): Asking for networkstatus consensus
Sep 07 10:25:51.000 [notice] New control connection opened.
Sep 07 10:25:51.000 [notice] Bootstrapped 30% (loading_status): Loading networkstatus consensus
Sep 07 10:25:51.000 [notice] I learned some more directory information, but not enough to build a circuit: We have no usable consensus.
Sep 07 10:25:51.000 [notice] Bootstrapped 40% (loading_keys): Loading authority key certs
Sep 07 10:25:51.000 [notice] The current consensus has no exit nodes. Tor can only build internal paths, such as paths to onion services.
Sep 07 10:25:51.000 [notice] Bootstrapped 45% (requesting_descriptors): Asking for relay descriptors
Sep 07 10:25:51.000 [notice] I learned some more directory information, but not enough to build a circuit: We need more microdescriptors: we have 0/2, and can only build 0% of likely paths. (We have 0% of guards bw, 0% of midpoint bw, and 0% of end bw (no exits in consensus, using mid) = 0% of path bw.)
Sep 07 10:25:51.000 [notice] Bootstrapped 50% (loading_descriptors): Loading relay descriptors
Sep 07 10:25:51.000 [notice] The current consensus contains exit nodes. Tor can build exit and internal paths.
Sep 07 10:25:51.000 [notice] Bootstrapped 75% (enough_dirinfo): Loaded enough directory info to build circuits
Sep 07 10:25:51.000 [notice] New control connection opened from 127.0.0.1.
Sep 07 10:25:51.000 [notice] New control connection opened from 127.0.0.1.
Sep 07 10:25:52.000 [notice] New control connection opened from 127.0.0.1.
Sep 07 10:25:52.000 [warn] No available nodes when trying to choose node. Failing.
Sep 07 10:25:52.000 [warn] No available nodes when trying to choose node. Failing.
Sep 07 10:25:52.000 [warn] Failed to find node for hop #2 of our path. Discarding this circuit.
Sep 07 10:25:52.000 [notice] Our circuit 0 (id: 3) died due to an invalid selected path, purpose General-purpose client. This may be a torrc configuration issue, or a bug.
Sep 07 10:25:52.000 [notice] New control connection opened.
Sep 07 10:25:53.000 [notice] New control connection opened from 127.0.0.1.
Sep 07 10:25:53.000 [warn] No available nodes when trying to choose node. Failing.
Sep 07 10:25:53.000 [warn] No available nodes when trying to choose node. Failing.
Sep 07 10:25:53.000 [warn] Failed to find node for hop #2 of our path. Discarding this circuit.
Sep 07 10:25:53.000 [notice] New control connection opened.
Sep 07 10:25:53.000 [notice] New control connection opened from 127.0.0.1.
Sep 07 10:25:54.000 [notice] New control connection opened from 127.0.0.1.
Sep 07 10:25:54.000 [warn] No available nodes when trying to choose node. Failing.
Sep 07 10:25:54.000 [warn] No available nodes when trying to choose node. Failing.
Sep 07 10:25:54.000 [warn] Failed to find node for hop #2 of our path. Discarding this circuit.

… and then the last 3 lines repeat endlessly.

Note that we currently use 20 non-exit relays.

#29 Updated by intrigeri 3 days ago

  • Blocks deleted (Bug #16471: Drop time synchronization hacks that tor 0.3.5 and 0.4.x made obsolete)

#30 Updated by intrigeri about 21 hours ago

  • Subject changed from Upgrade our Chutney fork and make configuration more similar to the real Tor network to Upgrade our Chutney fork
  • Feature Branch changed from feature/16792-update-chutney+force-all-tests to feature/16792-only-update-chutney+force-all-tests

Narrowing scope to something I feel can be completed soon.

I've renamed to wip/ the branches that "make configuration more similar to the real Tor network": wip/feature/16792-update-chutney+force-all-tests and wip/feature/16792-normal-consensus+force-all-tests. That'll be for another ticket whenever we need that.

#31 Updated by intrigeri about 4 hours ago

  • I'm starting to suspect that we're starting to use the simulated tor network before it's fully ready: in Chutney nodes' logs I see quite often bootstrap status that's below 100%. It seems we should use chutney wait_for_bootstrap in our ensure_chutney_is_running function: see the "Waiting for the network" section in Chutney's README for details.

I've implemented this on feature/16792-only-update-chutney+force-all-tests. I have good hopes it will eliminate some kinds of test suite robustness issues but unfortunately, it did not magically solve the "nc: connection failed, SOCKSv5 error: General SOCKS server failure" problem, which is thus left to be investigated.

Also available in: Atom PDF