Project

General

Profile

Bug #14770

Bug #10288: Fix newly identified issues to make our test suite more robust and faster

"Fetching OpenPGP keys" scenarios are fragile: communication failure with keyserver

Added by intrigeri about 2 years ago. Updated 20 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Test suite
Target version:
Start date:
10/04/2017
Due date:
% Done:

0%

Feature Branch:
bugfix/12689-more-reliable-OpenPGP-keyserver
Type of work:
Code
Blueprint:
Starter:
Affected tool:

Description

The temporary solution implemented in #12211 is not robust enough: while doing #12291 I see lots of failures in August. Thankfully we've discussed (and agreed on) better options on #12211.


Related issues

Related to Tails - Bug #12211: Adapt GnuPG automated tests after switching to an Onion keyserver Resolved 02/03/2017
Related to Tails - Bug #10378: The "Tails OpenPGP keys" scenario is fragile Resolved 10/15/2015
Related to Tails - Feature #9519: Make the test suite more deterministic through network simulation In Progress 06/02/2015
Related to Tails - Bug #15415: Unreliable key server operations Resolved 03/14/2018
Related to Tails - Bug #17169: Seahorse can't sync keys with keyservers: Request Entity Too Large Confirmed
Blocked by Tails - Bug #12689: gpg --recv-key often hangs due to unreliable keyserver Resolved 06/13/2017
Blocks Tails - Feature #16209: Core work: Foundations Team Confirmed

Associated revisions

Revision 1db04699 (diff)
Added by intrigeri almost 2 years ago

Test suite: tag "Feature: Keyserver interaction with GnuPG" @fragile (refs: #14770)

Revision dbfbfa7b (diff)
Added by intrigeri about 2 months ago

Use keys.openpgp.org's Onion service as the default keyserver (refs: #12689, #14770)

For background, see #12689 and its various duplicates. The short version is:

- Unfortunately, hkp://jirk5u4osbsr34t5.onion is way too unreliable.
- Most non-tech-savvy OpenPGP users don't use keyservers at all,
so this change should not affect them much.
- Tech-savvy OpenPGP users who want to use the Web-of-Trust (which
keys.openpgp.org's design essentially kills) should be able
to switch to a keyserver of their choosing, that includes
non-self certifications.

Let's use the Onion service instead of hkps://keys.openpgp.org/, so that we
don't lose end-to-end encryption and authentication of the keyserver in
Seahorse, which doesn't support hkps://. Alternatively, we could use
hkps://keys.openpgp.org/ everywhere else, but it feels simpler to use the same
keyserver everywhere.

At this point, the only Tails systems that are affected by this change are those
run without GnuPG persistence, and newly created persistent GnuPG configuration.
Pre-existing persistent GnuPG configuration is not updated (yet).

On the test suite front:

- This commit keeps the Chutney-based redirector setup as-is, except it will
proxy requests to keys.openpgp.org, instead of pool.sks-keyservers.net
previously. This should work as long as keys.openpgp.org supports cleartext
communication on port 11371.
- In theory, our long-term plan is to replace this with a local mock keyserver
Onion service. We'll see if that's still worth the effort once we redirect
requests to a more reliable upstream keyserver.
- I'm removing the @fragile tag for torified_gnupg.feature. There might
be other reasons why these scenarios are fragile; let's learn about them.

Revision 206d1d7e (diff)
Added by intrigeri about 2 months ago

Test suite: switch backend keyservers (refs: #12689, #14770)

First, we do have to do something here, as long as we use an Onion service as
our default keyserver: our Chutney is not able to connect to Onion
services (#12210).

1. dirmngr: use keys.openpgp.org's Onion service directly =========================================================

That is, without going through a Chutney-based Onion service.

When dirmngr connects to the Onion service run by Chutney, the isotester
redirects the connection to keys.openpgp.org:11371; so far, so good. But then,
keys.openpgp.org redirects us to https://keys.openpgp.org, and the key retrieval
fails for some reason I don't fully understand:

dirmngr[10130]: connection from process 10145 (1000:1000)
dirmngr[10130]: DBG: chan_5 <- GETINFO version
dirmngr[10130]: DBG: chan_5 > D 2.2.12
dirmngr[10130]: DBG: chan_5 -> OK
dirmngr[10130]: DBG: chan_5 <
KS_GET -- 0xC4BC2DDB38CCE96485EBE9C2F20691179038E5C6
dirmngr[10130]: DBG: gnutls:L3: ASSERT: ../../../lib/x509/common.c[_gnutls_x509_get_raw_field2]:1570
dirmngr[10130]: DBG: gnutls:L3: ASSERT: ../../../lib/x509/x509.c[gnutls_x509_crt_get_subject_unique_id]:3897
dirmngr[10130]: DBG: gnutls:L3: ASSERT: ../../../lib/x509/x509.c[gnutls_x509_crt_get_issuer_unique_id]:3947
dirmngr[10130]: DBG: gnutls:L3: ASSERT: ../../../lib/x509/dn.c[_gnutls_x509_compare_raw_dn]:990
dirmngr[10130]: number of system provided CAs: 128
dirmngr[10130]: DBG: gnutls:L5: REC[0x78432c2d1e10]: Allocating epoch #0
dirmngr[10130]: DBG: gnutls:L2: added 6 protocols, 29 ciphersuites, 18 sig algos and 9 groups into priority list
dirmngr[10130]: URL 'http://nl5vtjfpfz2llza7.onion:5858/pks/lookup?op=get&options=mr&search=0xC4BC2DDB38CCE96485EBE9C2F20691179038E5C6' redirected to 'https://keys.openpgp.org/pks/lookup?op=get&options=mr&search=0xC4BC2DDB38CCE96485EBE9C2F20691179038E5C6' (301)
dirmngr[10130]: DBG: gnutls:L5: REC[0x78432c2d1e10]: Start of epoch cleanup
dirmngr[10130]: DBG: gnutls:L5: REC[0x78432c2d1e10]: End of epoch cleanup
dirmngr[10130]: DBG: gnutls:L5: REC[0x78432c2d1e10]: Epoch #0 freed
dirmngr[10130]: command 'KS_GET' failed: Forbidden &lt;Unspecified source&gt;
dirmngr[10130]: DBG: chan_5 > ERR 251 Forbidden &lt;Unspecified source&gt;
dirmngr[10130]: DBG: chan_5 <
BYE
dirmngr[10130]: DBG: chan_5 -> OK closing connection

dirmngr has a rather complex history wrt. TLS certificate validation vs.
HTTP redirections vs. DNS resolution, so I'm not surprised that this fails.

The code I'm disabling here was introduced for #12211. As I understand it, the
main goal there was to adjust our OpenPGP test cases to the fact we had switched
Tails to a .onion keyserver. It's not 100% clear to me why we added the layer of
indirection I'm removing here, instead of doing something similar to this
commit: I see no explanation on the ticket nor in the corresponding
commit messages.

So let's drop the extra complexity of going through a proxy Onion service on
Chutney, and instead reconfigure dirmngr to directly connect to
keys.openpgp.org. AFAICT, given previously we were not testing the dirmngr
configuration we're shipping to our users either, the only test coverage we lose
here is: testing that dirmngr can connect to a Onion service.

2. Seahorse: use a keyserver that meets our (so far implicit) requirements ==========================================================================

At least one member of pool.sks-keyservers.net does not satisfy the implicit
assumption that the code added for #12211 relied upon. This commit fixes that
by always using a backend, upstream keyserver that satisfies our assumptions.

Other than that, we stick to the previous setup here. This seems to be the only
vaguely viable option given we can't configure Seahorse to use keys.openpgp.org:
that keyserver redirects to HTTPS, which Seahorse does not support.

I'm (re-)tagging the Seahorse scenarios @fragile: in the end, it's not clear
whether this branch will improve things for those scenarios.

Revision 4ca60fa0 (diff)
Added by intrigeri about 2 months ago

Test suite: update image for Buster (refs: #14770)

Last time we updated it was for Jessie. I suspect it was already outdated for
Stretch and might have been part of the reasons for #14770: the recovery_proc
would never do anything useful.

History

#1 Updated by intrigeri about 2 years ago

  • Blocked by Feature #12292: Deal with September 2017 false positive scenarios added

#2 Updated by intrigeri about 2 years ago

  • Related to Bug #12211: Adapt GnuPG automated tests after switching to an Onion keyserver added

#3 Updated by intrigeri about 2 years ago

  • Related to Bug #10378: The "Tails OpenPGP keys" scenario is fragile added

#4 Updated by anonym about 2 years ago

  • Target version changed from Tails_3.3 to Tails_3.5

#5 Updated by intrigeri about 2 years ago

  • Related to Feature #9519: Make the test suite more deterministic through network simulation added

#6 Updated by intrigeri about 2 years ago

Wrt. the best long-term option we've selected on #12211 ("Run a local mock keyserver onion. This doesn't depend on the Internet => potentially 100% robust"), I've read that Schleuder 3 does mock the keyserver in its test suite. Likely that's in Ruby so we could maybe reuse it easily :)

#7 Updated by anonym about 2 years ago

  • Target version deleted (Tails_3.5)

#8 Updated by intrigeri almost 2 years ago

  • Type of work changed from Research to Code

intrigeri wrote:

bertagaz said on #12290#note-8 that these failures had disappeared in September, so I'm not flagging these scenarios as fragile: instead, while doing #12292 anonym should check this and act accordingly, hence making this a Research ticket on anonym's plate for this month.

I've just seen this happen again (https://jenkins.tails.boum.org/view/RM/job/test_Tails_ISO_stable/1085/) so I've tagged these scenarios as fragile.

#9 Updated by intrigeri almost 2 years ago

  • Description updated (diff)

#10 Updated by intrigeri almost 2 years ago

  • Blocked by deleted (Feature #12292: Deal with September 2017 false positive scenarios)

#11 Updated by intrigeri almost 2 years ago

  • Status changed from Confirmed to In Progress

#12 Updated by sajolida over 1 year ago

  • Related to Bug #15415: Unreliable key server operations added

#13 Updated by intrigeri about 1 year ago

  • Related to Bug #12689: gpg --recv-key often hangs due to unreliable keyserver added

#14 Updated by intrigeri about 1 year ago

  • Related to deleted (Bug #12689: gpg --recv-key often hangs due to unreliable keyserver)

#15 Updated by intrigeri 3 months ago

  • Assignee deleted (anonym)

Every such scenario has failed a few dozens of times on Jenkins in the last 2 months. But that's an actual bug in Tails: #12689. So this ticket is not actionable until that bug is fixed.

#16 Updated by intrigeri 3 months ago

  • Blocked by Bug #12689: gpg --recv-key often hangs due to unreliable keyserver added

#17 Updated by intrigeri 3 months ago

  • Subject changed from "Fetching OpenPGP keys" scenarios are fragile to "Fetching OpenPGP keys" scenarios are fragile: communication failure with keyserver

(Let's disambiguate from other tickets that track other reasons why such scenarios are fragile.)

#18 Updated by intrigeri about 2 months ago

  • Assignee set to intrigeri
  • Feature Branch set to bugfix/12689-more-reliable-OpenPGP-keyserver

#19 Updated by intrigeri about 2 months ago

#20 Updated by intrigeri about 2 months ago

While working on the underlying Tails bug (#12689), I discovered one more reason why what was set up on #12211 is fragile: it implicitly relies on the assumption that every member of the actual target pool (pool.sks-keyservers.net) can be queried via hkp://$IP:11373/ with any HTTP Host header of our choosing (in this case, the .onion run by Chutney). As it happens, this assumption is invalid for at least one member of that pool: 193.224.163.43 has Apache VirtualHost:s explicitly configured for the hostnames it supports, but the fallback VirtualHost that we hit is not a keyserver.

Demonstration:

  • curl --resolve pool.sks-keyservers.net:11371:193.224.163.43 'http://pool.sks-keyservers.net:11371/pks/lookup?op=get&options=mr&search=0x7C84A74CFB12BC439E81BA78C92949B8A63BB098' works fine because it sends the correct Host header in the HTTP request
  • curl 'http://193.224.163.43:11371/pks/lookup?op=get&options=mr&search=0x7C84A74CFB12BC439E81BA78C92949B8A63BB098' fails with a 404 error, because http://193.224.163.43:11371/ merely serves the default Apache homepage and is not backed by a keyserver

I can't think of a way to fix this, apart of having our test suite use as its upstream keyserver one that meets this now-explicit requirement, instead of any random member of the HKP pool.

#21 Updated by intrigeri about 2 months ago

  • Related to Bug #17169: Seahorse can't sync keys with keyservers: Request Entity Too Large added

#22 Updated by intrigeri about 2 months ago

  • Status changed from In Progress to Needs Validation
  • Assignee deleted (intrigeri)
  • Target version set to Tails_4.0

(Same as #12689#note-27)

#23 Updated by intrigeri about 2 months ago

  • Target version changed from Tails_4.0 to Tails_4.1

#24 Updated by hefee 26 days ago

  • Status changed from Needs Validation to 11

applied in c73634f6f076542621c9126b3b3cdc434b4dee7e.

#25 Updated by intrigeri 20 days ago

  • Status changed from 11 to Resolved

Also available in: Atom PDF