Project

General

Profile

Bug #10444

Bug #10288: Fix newly identified issues to make our test suite more robust and faster

Git tests are fragile

Added by kytv over 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Test suite
Target version:
Start date:
10/28/2015
Due date:
% Done:

100%

Feature Branch:
test/10444-git-tests-are-fragile
Type of work:
Code
Blueprint:
Starter:
Affected tool:

Description

When a git clone fails in the test suite, we should force a new Tor circuit and retry the clone.

00_02_40_Cloning_git_repository_over_SSH.png View - Potential error over SSH (70.8 KB) kytv, 11/07/2015 06:46 AM

00_01_09_Cloning_a_Git_repository_anonymously_over_the_Git_protocol.png View - Name resolution failure (62.2 KB) kytv, 11/07/2015 11:31 AM


Related issues

Related to Tails - Bug #11563: Git over HTTPS scenario is fragile Resolved 07/14/2016

Associated revisions

Revision 7dcca512 (diff)
Added by kytv about 4 years ago

Force a new Tor circuit and retry cloning on failure

A Git clone which fails due to transient network errors will likely
succeed after forcing a new Tor circuit.

Will-fix: #10444

Revision be6abae3 (diff)
Added by bertagaz over 3 years ago

Make the Git cloning process more robust.

  • Wrap the "has stopped running after 180 seconds" into try_for().
  • Use Git process monitoring to ensure cloning command has run in this
    amount of time, otherwise kill it.

Refs: #10444

Revision 5111b1ce (diff)
Added by bertagaz over 3 years ago

Turn time to wait into a parameter.

Refs: #10444

Revision d834073a (diff)
Added by bertagaz over 3 years ago

Type!

Refs: #10444

Revision 2dfea09f (diff)
Added by bertagaz over 3 years ago

Revert "Type!"

This reverts commit d834073ae8d381d60c6dd9bc347e0fb9c763d02a.

Refs: #10444

Revision 23094ad1 (diff)
Added by bertagaz over 3 years ago

Revert "Turn time to wait into a parameter."

This reverts commit 5111b1ce30840726eb9c2cbc43659b9d490b7d1c.

Wording didn't sound correct.

Refs: #10444

Revision 746971d5 (diff)
Added by bertagaz over 3 years ago

Remove unecessary and racy conditions.

Refs: #10444

Revision e56fcfcf (diff)
Added by bertagaz over 3 years ago

Tag the Git over HTTPS scenario as fragile.

Refs: #10444, #11563

Revision 0526579f
Added by intrigeri over 3 years ago

Merge remote-tracking branch 'origin/test/10444-git-tests-are-fragile' into stable

Fix-committed: #10444

History

#1 Updated by kytv over 4 years ago

  • Parent task set to #10288
  • Feature Branch set to kytv/test/10444-git-over-https-is-fragile

#3 Updated by intrigeri about 4 years ago

  • Assignee set to kytv

#4 Updated by kytv about 4 years ago

  • Subject changed from Cloning a git repository over HTTPS test is fragile to Git tests are fragile
  • Target version set to Tails_1.8

The whole feature should use retrying upon failure.

#5 Updated by kytv about 4 years ago

  • Feature Branch changed from kytv/test/10444-git-over-https-is-fragile to kytv/test/10444-git-tests-are-fragile

#6 Updated by kytv about 4 years ago

  • Description updated (diff)
  • Status changed from Confirmed to In Progress
  • % Done changed from 0 to 10

#7 Updated by kytv about 4 years ago

Attaching example of a failed clone via SSH.

#9 Updated by kytv about 4 years ago

  • Assignee changed from kytv to anonym
  • % Done changed from 10 to 40
  • QA Check set to Ready for QA

#10 Updated by intrigeri about 4 years ago

  • Assignee changed from anonym to intrigeri

I'll try to review this to lighten a bit anonym's plate and keep kytv moving :)

#11 Updated by bertagaz about 4 years ago

  • Assignee changed from intrigeri to bertagaz

intrigeri wrote:

I'll try to review this to lighten a bit anonym's plate and keep kytv moving :)

Nop, I am.

#12 Updated by bertagaz about 4 years ago

  • Target version changed from Tails_1.8 to Tails_2.0

Postponing

#13 Updated by kytv about 4 years ago

I merged devel in to this branch, made a simple change for jessie, and pushed.

#14 Updated by kytv about 4 years ago

  • Assignee changed from bertagaz to kytv
  • QA Check changed from Ready for QA to Dev Needed

Nevermind, still more to update for Jessie.

#15 Updated by kytv about 4 years ago

  • Assignee changed from kytv to bertagaz
  • QA Check changed from Dev Needed to Ready for QA

This is now working with Jessie.

Note that the GUI seems to be a bit slower to respond in Jessie, causing Sikuli to type before we're ready for it, such as typing 'yes' when verifying the fingerprint.

I don't have a problem after setting the TailsToaster VM to use multiple cores/CPUs.

#16 Updated by kytv about 4 years ago

kytv wrote:

Note that the GUI seems to be a bit slower to respond in Jessie, causing Sikuli to type before we're ready for it, such as typing 'yes' when verifying the fingerprint.

I don't have a problem after setting the TailsToaster VM to use multiple cores/CPUs.

Note, further, that this could not be reproduced on a system without additional load.

#17 Updated by intrigeri about 4 years ago

Note that the GUI seems to be a bit slower to respond in Jessie, causing Sikuli to type before we're ready for it, such as typing 'yes' when verifying the fingerprint.

This seems to indicate that there's a missing step in the test, no?

#18 Updated by kytv about 4 years ago

intrigeri wrote:

Note that the GUI seems to be a bit slower to respond in Jessie, causing Sikuli to type before we're ready for it, such as typing 'yes' when verifying the fingerprint.

This seems to indicate that there's a missing step in the test, no?

What I saw (and showed anonym) was that in response to the "Are you sure you want to continue connecting" prompt, sikuli is supposed to type "yes" then press enter. Instead we got an "s". After that, "please type 'yes' or 'no'" Then there was a steady stream of y:s. (I don't have the video anymore, but maybe I can reproduce it if wanted)

We're already waiting for the question to be asked before typing our response; that step is there. Otherwise, we could introduce another static sleep after the "Are you sure you want to continue connecting" prompt.

#19 Updated by intrigeri about 4 years ago

What I saw (and showed anonym) was that in response to the "Are you sure you want to
continue connecting" prompt, sikuli is supposed to type "yes" then press enter.
Instead we got an "s". After that, "please type 'yes' or 'no'" Then there was
a steady stream of y:s. (I don't have the video anymore, but maybe I can reproduce
it if wanted)

We're already waiting for the question to be asked before typing our response; that
step is there. Otherwise, we could introduce another static sleep after the "Are
you sure you want to continue connecting" prompt.

I see. Most programs are racy in this respect, I think: they print the prompt and then setup whatever is needed to read the answer. Perhaps we indeed need to wait a bit on our side to make this really robust in practice, then? I don't think it's blocking reviewing/testing/merging the current state of your work (iterations!) but please make sure the resulting state of the test suite is robust (including on Jenkins, modulo lack of robustness in dependencies such as Tor bootstrapping process) before this ticket is marked as resolved. Cheers!

#20 Updated by kytv about 4 years ago

I pushed updates which should make it incredibly robust. I'm quite pleased with the end result. :)

#21 Updated by bertagaz about 4 years ago

  • Target version changed from Tails_2.0 to Tails_2.2

#22 Updated by bertagaz almost 4 years ago

  • Target version changed from Tails_2.2 to Tails_2.3

#23 Updated by bertagaz almost 4 years ago

  • Target version changed from Tails_2.3 to Tails_2.4

#24 Updated by bertagaz over 3 years ago

  • Target version changed from Tails_2.4 to Tails_2.5

#25 Updated by bertagaz over 3 years ago

  • Feature Branch changed from kytv/test/10444-git-tests-are-fragile to test/10444-git-tests-are-fragile

Updated the branch with origin/stable and imported Kytv's patch. Let see how it behaves in Jenkins while I test it $HOME.

#26 Updated by bertagaz over 3 years ago

  • Assignee changed from bertagaz to anonym
  • % Done changed from 40 to 50

Pushed a few commits that make this tests more robust.

I've tested it a lot at home, with a uplink going up and down all day long, and the Git and SSH protocol tests went well and recovered from this network failures. The HTTPS test has proven to be a bit less reliable, but seems to be due to the way Git handles HTTP, or the way it interacts with tsocks. When network is ok, it passes.

In Jenkins they all pass without problem, so I'm not sure we can harden this scenarios more than that.

If the HTTPS scenario seems to be too fragile upon review, I'd be to open another ticket, and mark it as fragile with this one, that way we can merge this branch and still have the other scenarios back in the test suite.

Assigning to anonym for review, but I'm not sure he's the right one. Should be a quick one anyway.

#27 Updated by bertagaz over 3 years ago

Note that this ticket branch has been merged into bugfix/10494-retry-htpdate-with-more-fragile-tests, so it has also been tested in Jenkins in this other branch.

#28 Updated by intrigeri over 3 years ago

  • Assignee changed from anonym to bertagaz
  • % Done changed from 50 to 60
  • QA Check changed from Ready for QA to Dev Needed

If the HTTPS scenario seems to be too fragile upon review, I'd be to open another ticket, and mark it as fragile with this one, that way we can merge this branch and still have the other scenarios back in the test suite.

Yes, please do so. Note that we have a test/10444-git-over-https-is-fragile branch.

I'm very much unconvinced by the wording change brought by 5111b1c, since it hides the fact that we're actually running a Git clone command: instead it suggests that we're merely waiting for it to complete. This is a blocker IMO.

My next to comments are about style, so either you want them and you apply them as-is, or we try to have a useful discussion about it, or you can just ignore them.

You don't need to test if a directory exists before rm -rf'ing it. Just rm -rf it and the code will be leaner and will feel less prone to race conditions :)

Similarly, in:

    if $vm.has_process?("git")
      step 'I kill the process "git"'
    end

... I think you can skip the test (that is racy anyway, which is what confused me initially, so better drop it).

#29 Updated by bertagaz over 3 years ago

  • Assignee changed from bertagaz to intrigeri
  • QA Check changed from Dev Needed to Ready for QA

intrigeri wrote:

If the HTTPS scenario seems to be too fragile upon review, I'd be to open another ticket, and mark it as fragile with this one, that way we can merge this branch and still have the other scenarios back in the test suite.

Yes, please do so. Note that we have a test/10444-git-over-https-is-fragile branch.

#11563

I'm very much unconvinced by the wording change brought by 5111b1c, since it hides the fact that we're actually running a Git clone command: instead it suggests that we're merely waiting for it to complete. This is a blocker IMO.

Ok, I guess I could argue, but that change has not that much importance. :)

Reverted, as well as d834073

You don't need to test if a directory exists before rm -rf'ing it. Just rm -rf it and the code will be leaner and will feel less prone to race conditions :)

Similarly, in:

... I think you can skip the test (that is racy anyway, which is what confused me initially, so better drop it).

Right, racy. That's older code I didn't touch. It has been written this way in other places (as you pointed in other tickets), that's why I thought it was the way to go.

Anyway, 746971d

#30 Updated by intrigeri over 3 years ago

  • Assignee changed from intrigeri to anonym

That's older code I didn't touch.

It might be older, but I was checking the diff between current stable and the proposed topic branch. If I'm not mistaken, the code that I was commenting was introduced by the branch you've proposed for merging.

Anyway, 746971d

Thanks for all the changes! Code review now passes for me, so I'm reassigning to anonym for the last steps :)

#31 Updated by intrigeri over 3 years ago

  • Assignee changed from anonym to intrigeri

#32 Updated by intrigeri over 3 years ago

  • Related to Bug #11563: Git over HTTPS scenario is fragile added

#33 Updated by intrigeri over 3 years ago

  • Status changed from In Progress to 11
  • Assignee deleted (intrigeri)
  • % Done changed from 60 to 100
  • QA Check changed from Ready for QA to Pass

Merged, thanks!

#34 Updated by intrigeri over 3 years ago

  • Status changed from 11 to Resolved

Also available in: Atom PDF