Project

General

Profile

Bug #8125

Self-host the Tor Browser tarballs we need

Added by anonym over 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Infrastructure
Target version:
Start date:
10/15/2014
Due date:
% Done:

100%

Feature Branch:
feature/8125-self-hosted-tor-browser-tarballs
Type of work:
Sysadmin
Blueprint:
Starter:
Affected tool:
Browser

Description

When upgrading to new a TBB for an imminent Tails release, we often have to fetch the TBB tarballs from e.g. http://people.torproject.org/~mikeperry/builds or some other temporary location. Hence, our release tags will only be buildable for as long the tarballs stay in that temporary place, which at best is a few months. If we fetch them from http://archive.torproject.org/tor-package-archive/torbrowser/ we do not have this issue, but TBB releases are only put there when publicly released, which generally is a day or three after we want to build our release image.

To solve this, we probably will have to host the Tor Browser tarballs ourselves, and point to this permanent location for anything that should be tagged.

Alternatively, if we want to piggy-back on all the good stuff from our freezable APT repo (#5926), we can adapt 10-tbb into a standalone script (and remove it from within Tails) that prepares a Tor Browser .deb:s. We then use it to package Tor Browser and upload it to our APT repo like we do for other packages. Another benefit from this is that we don't have to host all TBB tarballs, which will occupy much more space.


Related issues

Related to Tails - Bug #9020: Self-hosted setup for Tor Browser tarballs is fragile when upstream tarballs change Resolved 03/06/2015
Blocks Tails - Feature #5630: Reproducible builds Resolved 09/23/2015

Associated revisions

Revision bab2d694 (diff)
Added by Tails developers almost 5 years ago

Use our own Tor Browser archive when building an ISO.

Will-fix: #8125

Revision 40788f10 (diff)
Added by Tails developers almost 5 years ago

Document the new release process for Tor Browser.

Will-fix: #8125

Revision 2fa5f844
Added by Tails developers almost 5 years ago

Merge remote-tracking branch 'origin/feature/8125-self-hosted-tor-browser-tarballs' into testing

Fix-committed: #8125

History

#1 Updated by anonym over 5 years ago

#2 Updated by intrigeri about 5 years ago

#3 Updated by intrigeri about 5 years ago

#4 Updated by intrigeri about 5 years ago

  • Assignee set to anonym
  • QA Check set to Info Needed

This has actually little to do with #5926, as the custom part of our APT repository is already freezable (see e.g. the "1.2" APT suite in there) -- #5926 is about allowing us to freeze the set of packages we fetch from other sources, and I don't think we would upload the .deb's you're suggesting we build to Debian :)

Still, indeed this does block #5630.

Anyway. Given we already have the infrastructure in place to host .deb's linked to a given branch or version of Tails, indeed the suggested approach would work. I like the fact that it piggybacks on infrastructure, tools and processes (e.g. branch merging) we already have. However, I'm not sure if this existing infrastructure is really appropriate for this need. It might be that we would actually be adding quite some complexity, with little benefits. It's not clear to me what this .deb would look like, how we would build and maintain it. Another problem is that it introduces yet another binary blob that's hard to reproduce from source (if we don't keep the sources around forever ourselves), which partly defeats the purpose of having deterministic builds.

So, I think I'd anyway would like to see us store the pristine TB tarballs we need in some place we manage. At least these ones can be reproduced from their Git tag. And then, once we store these tarballs, why not use them at ISO build time?

We could store these tarballs without metadata at all, all stuffed into a single directory (well, keeping the sub-directory names as found on the Tor web server would probably be nicer). And then, we leave it to every Git branch to specify which ones they want to include, and how to validate it. Much like it's done already in 10-tbb, actually. Which means we get branching and merging support for free, in a way that's less complicated to handle for most of us (and new contributors) than an APT repository. This also means that we can rely on the Tor-provided tarballs until the actual Tails release is built (just before building it, upload the tarballs to our own server, switch the URLs to point there, and don't modify the hashes), which makes it easy for anyone to build a Tails ISO with e.g. the alpha branch of TB, or whatever they want, without needing access to our infrastructure: they just need to provide the right URL, and make the tarballs they want available there (possibly even locally).

I'm probably missing some bits, but what we roughly need to do that seems to only be:

  • having a directory, served on the web, and writable by Tails release managers;
  • patching the release process doc to add the new "upload tarballs there" and "adjust URLs in Git" steps.

This seems to be something we could easily do without waiting for any piece of new infrastructure, and without writing any new code (expect perhaps a helper script to retrieve the tarballs from a local apt-cacher-ng's cache, verify their hash, and upload the tarballs).

What do you think?

#5 Updated by intrigeri about 5 years ago

I've just discussed this with bertagaz, and he agrees with my proposal. Implementation-wise, on the short term this would be a directory on www.lizard, that the release managers have write access to (e.g. via git-annex, so that no shell access is required).

anonym, would it solve the problem for you in a way that you like?

#6 Updated by anonym about 5 years ago

  • Assignee changed from anonym to intrigeri

intrigeri wrote:

However, I'm not sure if this existing infrastructure is really appropriate for this need. It might be that we would actually be adding quite some complexity, with little benefits. It's not clear to me what this .deb would look like, how we would build and maintain it. Another problem is that it introduces yet another binary blob that's hard to reproduce from source (if we don't keep the sources around forever ourselves), which partly defeats the purpose of having deterministic builds.

I do not see why it would have to be principally different from the other Debian packages we maintain, really. We treat the TBB tarball contents as the source, and everything follows from that, IMHO.

More specifically what I had in mind was really a normal (git tracked) Debian packaging. It would contain:
  • the contents of the en TBB tarball + the language extensions only from the other TBB tarballs.
  • a few helper scripts based on the 10-tbb script:
    - update.sh: runs download_and_verify_files() and the extraction and configuration parts of install_tor_browser() and install_langpacks_from_bundles().
    - install.sh: runs the installation parts of install_tor_browser() and install_langpacks_from_bundles().
    - the rest of 10-tbb hook (like install_fake_iceweasel_pkg()) would remain in Tails. Other hooks that are candidates for inclusion directly in this package are 12-install_browser_searchplugins, 12-remove_unwanted_browser_searchplugins, 14-add_localized_browser_searchplugins, but some extra thought would have to be given on how to make it reproducible, e.g. specific versions of the iceweasel-l10n-* packages are used.
  • tbb-sha256sums.txt
  • the Debian packaging debian sub dir. In debian/rules we'd say that the install.sh helper script is to be used for "building" and installing.

We don't need tbb-dist-url.txt -- we'd hardcode the permanent mirror on archive.torproject.org as the place to fetch the tarballs from. The tarballs would be fetched into a gitignored ./tmp directory, but only if they do not already exist there, so at release time when the tarballs aren't in the permanent mirror we'd just manually fetch them from the temporary hosting place the TBB team put them.

To prepare the source for a new Tor Browser version we would:
1. update the checksums in tbb-sha256sums.txt
2. Run the update.sh helper script
3. Create a new entry in debian/changelog
4. git commit -a -m ...
5. git tag $VERSION ...

Build the package would be as simple as git-buildpackage or pdebuild or whatever.

If someone wants to verify that the TBB-parts (i.e. the "sources") for a given tag are legit, the only have to verify the checksums in tbb-sha256sums.txt and then run the download_and_verify_files helper script, and make sure that no files are different, i.e. git status reports no change.

I've just discussed this with bertagaz, and he agrees with my proposal. Implementation-wise, on the short term this would be a directory on www.lizard, that the release managers have write access to (e.g. via git-annex, so that no shell access is required).

anonym, would it solve the problem for you in a way that you like?

It would. While it's simpler than what I propose, it means we'd have to store all 15 tarballs in completion, which is an additional 15*41 MB = 615 MB per Tails release, and of that essentially 14/15:ths are redundant (remember, from the non-en TBB tarballs we only need the language packs which are ~500 KB each). That total size will quickly grow as new languages are supported, and the tarballs grow in size. Also, the bandwidth needed to build a given Tails release increases with the same number. With the package approach I'm proposing we'd only need to store (assuming 5 MB for all non-en langpacks ~41 + 5 MB for the .deb and ~41 + 5 MB for the source package, so ~90 MB in total per release. As for the Git repo, even though there are some large binary files that change between each version (e.g. libxul.so at 65 MB and the omni.ja:s at ~15 MB in total) I got the .git directory to 62 MB when including the contents of versions 4.0, 4.0.1 and 4.0.2.

So, while I'm fine with what you propose, I'd like to hear what you and bertagaz think about my more concrete proposal first.

#7 Updated by intrigeri about 5 years ago

intrigeri wrote:

Another problem is that it introduces yet another binary blob that's hard to reproduce from source (if we don't keep the sources around forever ourselves), which partly defeats the purpose of having deterministic builds.

I do not see why it would have to be principally different from the other Debian packages we maintain, really. We treat the TBB tarball contents as the source, and everything follows from that, IMHO.

One difference is that once we get ourselves a reproducible Debian package build environment, then:

  • for other Debian packages we maintain, the source code -> executable transformation can be verified just by rebuilding the package
  • for the TorBrowser, with the solution you're proposing, one first has to verify that the binaries in the TBB tarball match its source code, and additionally they have to verify that our source package matches what's in the TBB tarballs.

That's what I meant with "another binary blob that's hard to reproduce from source". Maybe no big deal, but there is a difference.

(Of course, in both cases, one will probably want to also verify that what's in our ISO matches what's in the binary Debian packages, but that's orthogonal to this discussion, as verifying that for one more package is negligible.)

And then, if we ever want to support verifying such things later on, then we have to rely on Tor project to keep a full history of TBB tarballs online (they're currently doing it, but for Debian packages we're working on avoiding such a dependency on external providers keeping content forever, so I would find it sad to add another similar dependency here), or we have to store that historical data ourselves, and then we can as well use it at build time.

anonym, would it solve the problem for you in a way that you like?

It would. While it's simpler than what I propose, it means we'd have to store all 15
tarballs in completion, which is an additional 15*41 MB = 615 MB per Tails release,
and of that essentially 14/15:ths are redundant (remember, from the non-en TBB
tarballs we only need the language packs which are ~500 KB each). That total size
will quickly grow as new languages are supported, and the tarballs grow in size.

I think we can afford hosting an additional dozen GB or so every year. My proposal would be to store historical TBB data only for releases, and during development go on relying on various temporary URLs hosted by Tor project.

Also, the bandwidth needed to build a given Tails release increases with the same number.

I acknowledge that's indeed painful when building after the TBB version was updated in our Git tree.

So, while I'm fine with what you propose, I'd like to hear what you and bertagaz think about my more concrete proposal first.

I still find it adds more complication than it solve problems. I'm particularly wary of how it complicates working on our Tor browser extraction/customization/installation scripts: one would have to build a new .deb and upload it before building an ISO, every time they want to change a line in there. E.g. your recent refactoring work in this area would have been much more painful this way, no? This also raises the bar for contribution in this area: "want to work on TBB integration in Tails? first set up pbuilder".

So, right now I think I'd like to try the simple approach I've proposed, so we get a quick solution to this problem, without anyone having to work (or procrastinate) for hours on something more complicated that isn't clearly better IMO. And we can reconsider later if the limitations of the KISS solution are too painful.

#8 Updated by intrigeri about 5 years ago

  • Assignee changed from intrigeri to anonym

#9 Updated by BitingBird about 5 years ago

  • Category changed from 176 to Infrastructure
  • Affected tool set to Browser

#10 Updated by anonym about 5 years ago

  • Assignee changed from anonym to intrigeri
  • Type of work changed from Discuss to Sysadmin

intrigeri wrote:

[...] So, right now I think I'd like to try the simple approach I've proposed, so we get a quick solution to this problem, without anyone having to work (or procrastinate) for hours on something more complicated that isn't clearly better IMO. And we can reconsider later if the limitations of the KISS solution are too painful.

Right, let's KISS. :)

Do you think you could set up the hosting space (accessible by me and other future potential RMs) so this can be used for the Tails 1.2.3 release?

#11 Updated by intrigeri about 5 years ago

  • Target version changed from Sustainability_M1 to Tails_1.2.3
  • QA Check deleted (Info Needed)

anonym wrote:

Do you think you could set up the hosting space (accessible by me and other future potential RMs) so this can be used for the Tails 1.2.3 release?

I'll try. Worst case, I'll do it in time for 1.3.

#12 Updated by intrigeri about 5 years ago

  • Target version changed from Tails_1.2.3 to Tails_1.3

#13 Updated by intrigeri about 5 years ago

  • Assignee changed from intrigeri to anonym
  • QA Check set to Info Needed

anonym, would you be fine with using git-annex to access that tarballs repository? You may want to read and try systems/ISO_history.mdwn in our internal Git repo, for an initial taste of how it would look like.

Pros:

  • I can reuse existing Puppet bits, instead of designing and implementing a new system.
  • git-annex automatically uses the rsync options that allow one to continue an interrupted download
  • we have a history of metadata changes to the tracked files
  • gets you started with git-annex => some day you'll upload new releases to the ISO history repo ;)
  • no need to worry about permissions and ownership on uploaded files
  • convenient ways for any other contributor to look up and retrieve whatever bits they want from that repo on the command-line or programmatically, without bothering with a web browser: one can simply use ls and cd to browse the content available in that repo.

Cons:

  • new tool to learn a bit about, as opposed to good old scp/rsync + chown/chmod/ACL

#14 Updated by intrigeri about 5 years ago

  • Assignee changed from anonym to intrigeri
  • QA Check deleted (Info Needed)

anonym said he's fine with git-annex.

#15 Updated by intrigeri almost 5 years ago

  • Subject changed from Host the Tor Browser tarballs we need ourselves to Self-host the Tor Browser tarballs we need

#16 Updated by intrigeri almost 5 years ago

  • Status changed from Confirmed to In Progress
  • % Done changed from 0 to 10

Everything deployed server-side, now uploading data.

#17 Updated by Tails almost 5 years ago

Applied in changeset commit:5eed35816632e3c0be431c7e9393b973b3618a48.

#18 Updated by Tails almost 5 years ago

Applied in changeset commit:07db0775f44a4dab93be673db9d07b192b2ad447.

#19 Updated by intrigeri almost 5 years ago

  • Assignee changed from intrigeri to anonym
  • % Done changed from 10 to 50
  • QA Check set to Ready for QA
  • Feature Branch set to feature/8125-self-hosted-tor-browser-tarballs

#20 Updated by Tails almost 5 years ago

  • Status changed from In Progress to 11
  • % Done changed from 50 to 100

Applied in changeset commit:a1ae8072f4e597994d1cc1a914175a5fa010ce8c.

#21 Updated by kytv almost 5 years ago

  • QA Check changed from Ready for QA to Pass

#22 Updated by anonym almost 5 years ago

  • Assignee deleted (anonym)

#23 Updated by BitingBird almost 5 years ago

  • Status changed from 11 to Resolved

#24 Updated by intrigeri almost 5 years ago

  • Related to Bug #9020: Self-hosted setup for Tor Browser tarballs is fragile when upstream tarballs change added

Also available in: Atom PDF