Bug #10288: Fix newly identified issues to make our test suite more robust and faster
Step 'the "10CC5BC7" key is in the live user's public keyring' is fragile
We've seen it failing a bit in Jenkins (2 times in September, 3 times in October).
We're not sure of why it fails, so bertagaz will have to dig in the history but it may have been erased with the 1.7 release.
Meanwhile there's been a bump in the retry_tor number that may have helped to workaround it.
One question raised that may help is that currently we fetch the key and check if it's actually been fetched within 2 minutes.
Should we perhaps enforce a limit in the fetch step and cancel the fetch if it's taking too long, then retry?
IIRC that was suggested for the Git tests.
#2 Updated by bertagaz about 4 years ago
- Assignee changed from bertagaz to anonym
- Type of work changed from Research to Discuss
Here are infos about why it fails:
- in job
test_Tails_ISO_testing #35, fails because the keyserver replied with "Bad gateway" error code:
- in job
test_Tails_ISO_testing #30, fails because the keyserver replied with "Gateway time-out" error code:
- in several jobs, it fails because of hostname resolution error for pool.sks-keyservers.net:
- in job
test_Tails_ISO_isotester1_metrics #42, fails with error "Cannot connect to destination":
- in job
test_Tails_ISO_feature-5926-freezable-apt-repository #7, took more than the 2 minutes timeout to fetch the key:
So it seems some retry magics could help, given it seems these are mostly network errors, either from our side but also on the sks keyserver one.
Note that there is always this stop icon in the error window of seahorse, could maybe be helpful to know when to retry?
Assigning to anonym (and adding kytv as watcher), so that you can organize on the next steps (define fix, and update ticket metadatas).
#4 Updated by kytv about 4 years ago
One of the problems that I discovered is that Seahorse WILL always segfault if there's a network error. It may not segfault until the close button is clicked, but it segfault. The way I decided to handle it is to always kill
seahorse during the keysyncing step and restart it since we know that it will segfault 100% of the time.
For seahorse, I'm going to assume that if there's a close button on the screen, that's an error. Much of the code I added earlier has been refactored and made less convoluted, partly because I'm using what I learned about ruby since I wrote it, partly because I understand seahorse's failure modes better. :)
I'm also going to kill the gpg binary if key fetching isn't successful within 60 seconds, force a new Tor circuit, then retry.