Project

General

Profile

Bug #12259

reboot_job is broken since Jenkins was upgraded to Stretch, which often breaks the test suite

Added by anonym about 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Elevated
Assignee:
-
Category:
Infrastructure
Target version:
Start date:
02/24/2017
Due date:
% Done:

100%

Feature Branch:
Type of work:
Sysadmin
Blueprint:
Starter:
Affected tool:

Description

See e.g.: https://jenkins.tails.boum.org/job/test_Tails_ISO_test-12019-totem-add-local-video-action/29/console

[...]
19:27:23 Command failed (returned pid 4334 exit 255): ["/var/lib/jenkins/workspace/test_Tails_ISO_test-12019-totem-add-local-video-action/submodules/chutney/chutney", "start", "/var/lib/jenkins/workspace/test_Tails_ISO_test-12019-totem-add-local-video-action/features/chutney/test-network", {:err=>[:child, :out]}]:
19:27:23 Using Python 2.7.9
19:27:23
19:27:23 Starting nodes
19:27:23
19:27:23 Couldn't launch test000auth (tor --quiet -f /tmp/TailsToaster/chutney-data/nodes/000auth/torrc): 1
19:27:23
19:27:23 Couldn't launch test001auth (tor --quiet -f /tmp/TailsToaster/chutney-data/nodes/001auth/torrc): 1
19:27:23
19:27:23 Couldn't launch test002auth (tor --quiet -f /tmp/TailsToaster/chutney-data/nodes/002auth/torrc): 1
[... same failures for all other nodes ...]

I have seen the same failure appear for me locally when I run the test suite (and hence Chutney) and press Ctrl+C (so Chutney is not cleaned up) and remove the TMPDIR (so the PID references are lost); if I then restart the test suite there are tor instances on the expected TCP ports, and chutney fails.

History

#1 Updated by anonym about 3 years ago

  • Description updated (diff)

#2 Updated by intrigeri almost 3 years ago

  • Subject changed from Chutney sometimes fails to start on Jenkins to reboot_job is broken since Jenkins was upgraded to Stretch, which often breaks the test suite
  • Status changed from Confirmed to In Progress
  • Assignee set to intrigeri
  • Priority changed from Normal to High
  • Target version set to Tails_2.11
  • % Done changed from 0 to 10

Pushed a tentative fix + made it so I'll get more debugging output.

#3 Updated by intrigeri almost 3 years ago

  • % Done changed from 10 to 50

Seems to work: https://jenkins.tails.boum.org/job/wrap_test_Tails_ISO_feature-stretch/ just failed, which is exactly what should happen. And it rebooted isotester5. Will verify it's fixed consistently later.

#4 Updated by intrigeri almost 3 years ago

So, with anonym we did a little post-mortem to find out how we could have noticed this problem earlier. Our best idea so far is to have the Jenkins test suite wrapper:

  1. exits with a non-zero error code if some flag file exists
  2. creates the flag file
  3. run the test suite

If we had had this in place, then all test suite runs would have failed and we would have an obvious explanation of what went wrong.

#5 Updated by intrigeri almost 3 years ago

  • Priority changed from High to Elevated
  • Target version changed from Tails_2.11 to Tails_2.12

Now that the immediate problem is fixed, I'll deal with the "how can we avoid that in the future" later.

#6 Updated by intrigeri almost 3 years ago

intrigeri wrote:

So, with anonym we did a little post-mortem to find out how we could have noticed this problem earlier. Our best idea so far is to have the Jenkins test suite wrapper:

  1. exits with a non-zero error code if some flag file exists
  2. creates the flag file
  3. run the test suite

If we had had this in place, then all test suite runs would have failed and we would have an obvious explanation of what went wrong.

Pushed an untested implementation to production ("what can possibly go wrong, it's 9 simple lines of shell?" -- famous last words).

#7 Updated by intrigeri almost 3 years ago

  • Status changed from In Progress to Resolved
  • Assignee deleted (intrigeri)
  • % Done changed from 50 to 100

Also available in: Atom PDF