Bug #10288: Fix newly identified issues to make our test suite more robust and faster
"Fetching OpenPGP keys" scenarios are fragile: communication failure with keyserver
Test suite: tag "Feature: Keyserver interaction with GnuPG" @fragile (refs: #14770)
For background, see #12689 and its various duplicates. The short version is:
- Unfortunately, hkp://jirk5u4osbsr34t5.onion is way too unreliable.
- Most non-tech-savvy OpenPGP users don't use keyservers at all,
so this change should not affect them much.
- Tech-savvy OpenPGP users who want to use the Web-of-Trust (which
keys.openpgp.org's design essentially kills) should be able
to switch to a keyserver of their choosing, that includes
Let's use the Onion service instead of hkps://keys.openpgp.org/, so that we
don't lose end-to-end encryption and authentication of the keyserver in
Seahorse, which doesn't support hkps://. Alternatively, we could use
hkps://keys.openpgp.org/ everywhere else, but it feels simpler to use the same
At this point, the only Tails systems that are affected by this change are those
run without GnuPG persistence, and newly created persistent GnuPG configuration.
Pre-existing persistent GnuPG configuration is not updated (yet).
On the test suite front:
- This commit keeps the Chutney-based redirector setup as-is, except it will
proxy requests to keys.openpgp.org, instead of pool.sks-keyservers.net
previously. This should work as long as keys.openpgp.org supports cleartext
communication on port 11371.
- In theory, our long-term plan is to replace this with a local mock keyserver
Onion service. We'll see if that's still worth the effort once we redirect
requests to a more reliable upstream keyserver.
- I'm removing the @fragile tag for torified_gnupg.feature. There might
be other reasons why these scenarios are fragile; let's learn about them.
First, we do have to do something here, as long as we use an Onion service as
our default keyserver: our Chutney is not able to connect to Onion
1. dirmngr: use keys.openpgp.org's Onion service directly =========================================================
That is, without going through a Chutney-based Onion service.
When dirmngr connects to the Onion service run by Chutney, the isotester
redirects the connection to keys.openpgp.org:11371; so far, so good. But then,
keys.openpgp.org redirects us to https://keys.openpgp.org, and the key retrieval
fails for some reason I don't fully understand:
dirmngr: connection from process 10145 (1000:1000)
dirmngr: DBG: chan_5 <- GETINFO version
dirmngr: DBG: chan_5
> D 2.2.12KS_GET -- 0xC4BC2DDB38CCE96485EBE9C2F20691179038E5C6
dirmngr: DBG: chan_5 -> OK
dirmngr: DBG: chan_5 <
dirmngr: DBG: gnutls:L3: ASSERT: ../../../lib/x509/common.c[_gnutls_x509_get_raw_field2]:1570
dirmngr: DBG: gnutls:L3: ASSERT: ../../../lib/x509/x509.c[gnutls_x509_crt_get_subject_unique_id]:3897
dirmngr: DBG: gnutls:L3: ASSERT: ../../../lib/x509/x509.c[gnutls_x509_crt_get_issuer_unique_id]:3947
dirmngr: DBG: gnutls:L3: ASSERT: ../../../lib/x509/dn.c[_gnutls_x509_compare_raw_dn]:990
dirmngr: number of system provided CAs: 128
dirmngr: DBG: gnutls:L5: REC[0x78432c2d1e10]: Allocating epoch #0
dirmngr: DBG: gnutls:L2: added 6 protocols, 29 ciphersuites, 18 sig algos and 9 groups into priority list
dirmngr: URL 'http://nl5vtjfpfz2llza7.onion:5858/pks/lookup?op=get&options=mr&search=0xC4BC2DDB38CCE96485EBE9C2F20691179038E5C6' redirected to 'https://keys.openpgp.org/pks/lookup?op=get&options=mr&search=0xC4BC2DDB38CCE96485EBE9C2F20691179038E5C6' (301)
dirmngr: DBG: gnutls:L5: REC[0x78432c2d1e10]: Start of epoch cleanup
dirmngr: DBG: gnutls:L5: REC[0x78432c2d1e10]: End of epoch cleanup
dirmngr: DBG: gnutls:L5: REC[0x78432c2d1e10]: Epoch #0 freed
dirmngr: command 'KS_GET' failed: Forbidden <Unspecified source>
dirmngr: DBG: chan_5
> ERR 251 Forbidden <Unspecified source>BYE
dirmngr: DBG: chan_5 <
dirmngr: DBG: chan_5 -> OK closing connection
dirmngr has a rather complex history wrt. TLS certificate validation vs.
HTTP redirections vs. DNS resolution, so I'm not surprised that this fails.
The code I'm disabling here was introduced for #12211. As I understand it, the
main goal there was to adjust our OpenPGP test cases to the fact we had switched
Tails to a .onion keyserver. It's not 100% clear to me why we added the layer of
indirection I'm removing here, instead of doing something similar to this
commit: I see no explanation on the ticket nor in the corresponding
So let's drop the extra complexity of going through a proxy Onion service on
Chutney, and instead reconfigure dirmngr to directly connect to
keys.openpgp.org. AFAICT, given previously we were not testing the dirmngr
configuration we're shipping to our users either, the only test coverage we lose
here is: testing that dirmngr can connect to a Onion service.
2. Seahorse: use a keyserver that meets our (so far implicit) requirements ==========================================================================
At least one member of pool.sks-keyservers.net does not satisfy the implicit
assumption that the code added for #12211 relied upon. This commit fixes that
by always using a backend, upstream keyserver that satisfies our assumptions.
Other than that, we stick to the previous setup here. This seems to be the only
vaguely viable option given we can't configure Seahorse to use keys.openpgp.org:
that keyserver redirects to HTTPS, which Seahorse does not support.
I'm (re-)tagging the Seahorse scenarios @fragile: in the end, it's not clear
whether this branch will improve things for those scenarios.
#6 Updated by intrigeri about 2 years ago
Wrt. the best long-term option we've selected on #12211 ("Run a local mock keyserver onion. This doesn't depend on the Internet => potentially 100% robust"), I've read that Schleuder 3 does mock the keyserver in its test suite. Likely that's in Ruby so we could maybe reuse it easily :)
#8 Updated by intrigeri about 2 years ago
- Type of work changed from Research to Code
bertagaz said on #12290#note-8 that these failures had disappeared in September, so I'm not flagging these scenarios as fragile: instead, while doing #12292 anonym should check this and act accordingly, hence making this a Research ticket on anonym's plate for this month.
I've just seen this happen again (https://jenkins.tails.boum.org/view/RM/job/test_Tails_ISO_stable/1085/) so I've tagged these scenarios as fragile.
While working on the underlying Tails bug (#12689), I discovered one more reason why what was set up on #12211 is fragile: it implicitly relies on the assumption that every member of the actual target pool (
pool.sks-keyservers.net) can be queried via
hkp://$IP:11373/ with any HTTP
Host header of our choosing (in this case, the .onion run by Chutney). As it happens, this assumption is invalid for at least one member of that pool: 220.127.116.11 has Apache VirtualHost:s explicitly configured for the hostnames it supports, but the fallback VirtualHost that we hit is not a keyserver.
curl --resolve pool.sks-keyservers.net:11371:18.104.22.168 'http://pool.sks-keyservers.net:11371/pks/lookup?op=get&options=mr&search=0x7C84A74CFB12BC439E81BA78C92949B8A63BB098'works fine because it sends the correct
Hostheader in the HTTP request
curl 'http://22.214.171.124:11371/pks/lookup?op=get&options=mr&search=0x7C84A74CFB12BC439E81BA78C92949B8A63BB098'fails with a 404 error, because http://126.96.36.199:11371/ merely serves the default Apache homepage and is not backed by a keyserver
I can't think of a way to fix this, apart of having our test suite use as its upstream keyserver one that meets this now-explicit requirement, instead of any random member of the HKP pool.