Project

General

Profile

Bug #10495

Bug #10288: Fix newly identified issues to make our test suite more robust and faster

The 'the time has synced' step is fragile

Added by anonym over 3 years ago. Updated almost 2 years ago.

Status:
In Progress
Priority:
Elevated
Assignee:
-
Category:
Test suite
Target version:
-
Start date:
11/06/2015
Due date:
% Done:

0%

QA Check:
Dev Needed
Feature Branch:
Type of work:
Research
Blueprint:
Starter:
Affected tool:

Description

See #10494 which will fix this in Tails, not the test suite. This ticket is really just to acknowledge this robustness issue, even though nothing will be done in the test suite.


Related issues

Related to Tails - Bug #13472: Replace www.centos.org in htpdate pools Resolved 07/15/2017
Blocked by Tails - Bug #10494: Retry htpdate when it fails Rejected 07/17/2016
Blocked by Tails - Feature #9521: Use the chutney Tor network simulator in our test suite Resolved 04/15/2016
Blocked by Tails - Bug #11562: Monitor servers from the htpdate pools Confirmed 07/14/2016
Blocks Tails - Feature #16209: Core work: Foundations Team Confirmed 03/22/2019

Associated revisions

Revision aeb903a6 (diff)
Added by bertagaz almost 2 years ago

Htpdate: fix date header regexp (refs: #10495).

It seems that some servers (sometimes) do not send their headers with
first letter uppercased, hence a lot of failures to find the date in it.

History

#1 Updated by anonym over 3 years ago

  • Blocked by Bug #10494: Retry htpdate when it fails added

#3 Updated by anonym over 3 years ago

  • Assignee set to kytv
  • Target version set to Tails_1.8
  • Parent task set to #10288

#4 Updated by bertagaz over 3 years ago

Since Nov 6, 2015 (test_Tails_ISO_experimental #42), this step has been one of the most common failures as it broke a test job 19 times, and without it, some would probably have passed as it was the only trouble.

This should probably be worked on ASAP to complete the test suite robustness.

#5 Updated by intrigeri over 3 years ago

This might be a duplicate of #10440.

#6 Updated by intrigeri over 3 years ago

  • Target version changed from Tails_1.8 to Tails_2.0

(We're going to mark as fragile all tests that depend on Tor to have bootstrapped for the moment => not so urgent.)

#7 Updated by intrigeri over 3 years ago

bertagaz wrote:

Since Nov 6, 2015 (test_Tails_ISO_experimental #42), this step has been one of the most common failures as it broke a test job 19 times, and without it, some would probably have passed as it was the only trouble.

I don't see it at all in the latest summary I generated and posted on #10288. bertagaz, can you please check?

#8 Updated by intrigeri over 3 years ago

It would be fine to postpone this. In any case, please prioritize your SponsorsM4 stuff (Icedove) higher.

#9 Updated by intrigeri over 3 years ago

  • Target version changed from Tails_2.0 to Tails_2.2

#10 Updated by intrigeri over 3 years ago

  • Target version changed from Tails_2.2 to Tails_2.3

intrigeri wrote:

It would be fine to postpone this. In any case, please prioritize your SponsorsM4 stuff (Icedove) higher.

Still the case.

#11 Updated by anonym about 3 years ago

  • Assignee changed from kytv to anonym
  • Target version changed from Tails_2.3 to Tails_2.4
  • Type of work changed from Wait to Research

Hopefully Chutney (#9521) will fix the tordate parts.

#12 Updated by anonym about 3 years ago

#10238 is probably related, but Redmine forbids adding a relationship due to circularity.

#13 Updated by anonym about 3 years ago

  • Blocked by Feature #9521: Use the chutney Tor network simulator in our test suite added

#15 Updated by intrigeri about 3 years ago

Again: is it a duplicate of #10440?

#16 Updated by intrigeri almost 3 years ago

intrigeri wrote:

Again: is it a duplicate of #10440?

Actually, let's say no: #10440 is about the scenarios that are specifically testing time sync, while this one is about the 'the time has synced' step, that all online scenarios rely on (via "Tor is ready"). So even though #10440 is "fixed" (by disabling some tests) in test/10497-tor-bootstrap-is-fragile, we still have a problem here, and unsurprisingly I've seen it break tests again.

#17 Updated by anonym almost 3 years ago

  • Target version changed from Tails_2.4 to Tails_2.5

#18 Updated by bertagaz almost 3 years ago

  • QA Check set to Ready for QA

So, as noted on #10494, I'll report here my test suite run results.

I've done a bit of report of my first runs in #10494#note-25. As promised, I've tried with the --connect-timeout options, but it didn't bring much amelioration: I had 3 failures of this step on 120 runs (so a bit more than without). Still I think this option makes sense. Waiting 2 minutes for a single request sounds too much for me, even over Tor.

After that I've found that 3 urls in the pools were faulty, and fixed them (as stated in #10494#note-30). Since then, I've run something like 150 times the scenario mentioned in #10494#note-25, and seen no failure!

So to me it seems that the little errors that appeared in the previous runs were probably due to this faulty urls. 2 of them were in the HTP_POOL_PAL pool, which may explain things, if htpdate tries 5 times for a pool before erroring out. I still see it is restarted some times though it seems to appear a bit less than before.

So in the end, I think the enhancement brought by #10494 fixes this step. Actually, it may very well have been a bug in Tails. I believe this ticket can also be considered RfQA now, so setting it accordingly.

#19 Updated by bertagaz almost 3 years ago

  • Assignee changed from anonym to bertagaz
  • QA Check changed from Ready for QA to Dev Needed

bertagaz wrote:

Since then, I've run something like 150 times the scenario mentioned in #10494#note-25, and seen no failure!

And it seems I needed to post this note to see one. :/

So this step is not entirely fixed, and I was too late at noticing the failure to get the reason why. When I inspected the htpdate logs, it claimed to have succeeded. So this could be due to Tor bootstrapping problems maybe.

I'll do more test, I'll raise the try_for time to see if I still have failures.

#20 Updated by intrigeri almost 3 years ago

Hold on, see my latest comments on #10494. IMO we should do something simpler and less risky first before we invest even more time here.

#21 Updated by intrigeri almost 3 years ago

  • Target version changed from Tails_2.5 to Tails_2.6

#23 Updated by intrigeri almost 3 years ago

  • Target version deleted (Tails_2.6)

#24 Updated by intrigeri almost 3 years ago

  • Assignee deleted (bertagaz)

#25 Updated by bertagaz almost 2 years ago

  • Priority changed from Normal to Elevated

Happened 60 times on all currently known branches in June, 104 times in total for what 2017 logs we have.

That's a lot, so raising priority. First step would probably be to check if HTTP servers used by htpdate are OK, then tackle #10494.

#26 Updated by intrigeri almost 2 years ago

  • Blocked by Bug #11562: Monitor servers from the htpdate pools added

#27 Updated by intrigeri almost 2 years ago

Happened 60 times on all currently known branches in June, 104 times in total for what 2017 logs we have.

Ouch.

That's a lot, so raising priority. First step would probably be to check if HTTP servers used by htpdate are OK, then tackle #10494.

IMO next step is #10495 (on your plate): there might be issues in our current HTP pool, and there's some hope that fixing them will avoid having to do #10494 at all.

#28 Updated by bertagaz almost 2 years ago

intrigeri wrote:

Happened 60 times on all currently known branches in June, 104 times in total for what 2017 logs we have.

Ouch.

That's a lot, so raising priority. First step would probably be to check if HTTP servers used by htpdate are OK, then tackle #10494.

IMO next step is #10495 (on your plate): there might be issues in our current HTP pool, and there's some hope that fixing them will avoid having to do #10494 at all.

To fix this faster than before we have monitoring in place, I've quickly checked the urls of the different HTP pools, and found out that www.centos.org is always failing to reply to curl (with the same cmdline than the one used by htpdate).

So I'll open a ticket and prepare a branch replacing www.centos.org by something like https://getfedora.org/ that seems to be reliable.

#29 Updated by bertagaz almost 2 years ago

  • Related to Bug #13472: Replace www.centos.org in htpdate pools added

#30 Updated by intrigeri almost 2 years ago

To fix this faster than before we have monitoring in place, I've quickly checked the urls of the different HTP pools, and found out that www.centos.org is always failing to reply to curl (with the same cmdline than the one used by htpdate).

Amazing!

#31 Updated by bertagaz almost 2 years ago

  • Status changed from Confirmed to In Progress

#32 Updated by intrigeri 3 months ago

#33 Updated by intrigeri 2 months ago

#34 Updated by intrigeri 2 months ago

Also available in: Atom PDF