Project

General

Profile

Bug #15107

Some topic branches are never built reproducibly during Jenkins daily runs => add option to specify which APT snapshot serials to use during build

Added by bertagaz over 1 year ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Continuous Integration
Target version:
Start date:
12/26/2017
Due date:
% Done:

100%

QA Check:
Pass
Feature Branch:
bugfix/15107-specify-APT-snapshots-serial-to-build-with
Type of work:
Code
Blueprint:
Starter:
Affected tool:

Description

Our branches based on devel use "latest" snapshots for every APT archive used at build time => their "reproducibly" (sic) job will fail if one these APT snapshots is updated between the start of the original build job and the start of the "reproducibly" job. Our branches based on stable are also affected, but to a lesser degree: they use the "latest" snapshot only for the debian-security archive. Daily Jenkins runs are scheduled deterministically, based on the name of the branch. So inevitably, the automatic daily rebuild of some branches will always fail to build reproducibly.

On #14924 we implemented a workaround for stable/testing/devel but long term, the only sane option is probably to teach our build system how to use snapshots specified on the command line (instead of "latest"), and to teach Jenkins to pass the correct parameters when trying to rebuild a given ISO image.


Related issues

Related to Tails - Bug #14924: reproducibly_build_Tails_ISO_stable Jenkins job always fails Resolved 11/06/2017
Related to Tails - Bug #14944: jenkins-data-disk is running out of diskspace Resolved 11/09/2017

Associated revisions

Revision aafdf8da (diff)
Added by bertagaz over 1 year ago

Add APT_SNAPSHOT_SERIAL build option.

It enables to specify which time-based APT snapshot repositories will be
used as 'latest' during the build, and will set it accordingly in the
resulting ISO image. Useful to reproduce another ISO image.

Refs: #15107, #14944, #14924

Revision 9c4d621a (diff)
Added by intrigeri over 1 year ago

apt-snapshots-serials: don't pretend the prepare-build action can use multiple arguments (refs: #15107).

Revision 0c6e8b0b (diff)
Added by intrigeri over 1 year ago

auto/config: use syntax compatible with "set -e" (refs: #15107).

This script has "set -e" already. I think that's currently ignored due to the
way live-build runs it, but let's not create confusion by writing code in there
that will break whenever "set -e" becomes effective. Besides, we try to use
"set -e" everywhere so let's write code that's compatible with it, which can be
useful e.g. if we copy'n'paste code around.

Revision 04150ceb (diff)
Added by intrigeri over 1 year ago

auto/config: make newly added debug message clearer (refs: #15107).

Revision 53ed9b40 (diff)
Added by intrigeri over 1 year ago

Avoid confusing semantics (refs: #15107).

Let's pass an argument to `apt-snapshots-serials prepare-build' only when we
actually have something useful to pass it: passing the empty string around as an
argument is bound to create confusion later on.

Revision ec4156bc (diff)
Added by intrigeri over 1 year ago

apt-snapshots-serials: add cat-json action (refs: #15107)

Revision 75caf780 (diff)
Added by intrigeri over 1 year ago

apt-snapshots-serials: fix regression when prepare-build is called multiple times in a row, e.g. for testing (refs: #15107)

The newly introduced 'if [ "${APT_SNAPSHOT_SERIAL}" ]' code branch always
appends to a potentially existing file, while the previously existing code
replaced it on each run.

Revision 9a05d912 (diff)
Added by intrigeri over 1 year ago

apt-snapshots-serials: load serialized serials from JSON (refs: #15107)

I.e. don't assume that all APT archives used during a build share the same,
single serial.

Revision f976a97a (diff)
Added by intrigeri over 1 year ago

Rename variable (APT_SNAPSHOT_SERIAL → APT_SNAPSHOT_SERIALS): it now includes multiple serials (refs: #15107).

Revision 6507db09 (diff)
Added by intrigeri over 1 year ago

Build doc: use a complete sentence (refs: #15107).

Revision 979ba36d (diff)
Added by intrigeri over 1 year ago

apt-snapshots-serials-load-json: forbid "latest" in APT_SNAPSHOT_SERIALS (refs: #15107)

The code in the prepare-build action assumes that the
tmp/cached_APT_snapshots_serials file contains resolved serials, so let's make
sure we never, ever write "$origin: latest" in there.

Revision d8db001f (diff)
Added by intrigeri 5 months ago

Rename APT_SNAPSHOT_SERIALS → APT_SNAPSHOTS_SERIALS (refs: #15107)

This is more consistent with the naming of the corresponding
apt-snapshots-serials* scripts.

Revision dca4e26a (diff)
Added by intrigeri 5 months ago

Allow passing an APT snapshots config directory to "apt-snapshots-serials cat-json" (refs: #15107)

We'll need to call it on tmp/APT_snapshots.d in order to save
the resolved version of the serials we're going to use,
i.e. with "latest" replaced by the current latest serial.

Previously one could do that by using apt-snapshots-serials-cat-json
directly, bypassing the apt-snapshots-serials abstraction layer,
which feels wrong: I prefer apt-snapshots-serials-cat-json to
remain an implementation detail and not a stable API.

Revision 2da49e69 (diff)
Added by intrigeri 5 months ago

When testing ISO build reproducibility, use the same APT snapshots in both builds (refs: #15107).

Problem:

- Our branches based on devel use "latest" snapshots for every APT archive used
at build time => their reproducibly_build_Tails_ISO_* job will fail if any of
these APT snapshots is updated between the start of the original build job
and the start of the reproducibly_build_Tails_ISO_* job.
- Our branches based on stable are also affected, but to a lesser degree: they
use the "latest" snapshot only for the debian-security archive.
- Any branch can be affected when the build is triggered by a Git push at an
unfortunate time. But for some branches, the automatic daily build is
always affected: daily Jenkins job runs are scheduled in a deterministic
manner, with a schedule based on the name of the branch. So inevitably, the
automatic daily rebuild of some branches will always fail to build
reproducibly, because the failure condition ("APT snapshots is updated
between the start of the original build job and the start of the
reproducibly_build_Tails_ISO_* job") will always be met. There's no such
active branch at the moment but we've seen that happen in the past.

To fix that, let's ensure we use the same APT snapshots during the second build
as the ones the first build used. Here's how.

With this commit, we save the serials an ISO build used as a build artifact that
the downstream reproducibly_build_Tails_ISO_* CI job will copy and then load
environment from (using the Jenkins EnvInject Plugin).

Therefore, in a given reproducibly_build_Tails_ISO_* CI job run, the
APT_SNAPSHOTS_SERIALS environment variable will tell what APT snapshots were
used by its upstream build_Tails_ISO_* CI job run. And finally, thanks to
aafdf8da72df5e9ffdada2eabc920325a6e13520 and follow-ups, that downstream
reproducibly_build_Tails_ISO_* CI job run will reuse the same snapshots.

Revision c369ace3
Added by intrigeri 3 months ago

Merge branch 'bugfix/15107-specify-APT-snapshots-serial-to-build-with' into devel (Fix-committed: #15107)

History

#1 Updated by bertagaz over 1 year ago

  • Related to Bug #14924: reproducibly_build_Tails_ISO_stable Jenkins job always fails added

#2 Updated by bertagaz over 1 year ago

  • Related to Bug #14944: jenkins-data-disk is running out of diskspace added

#3 Updated by bertagaz over 1 year ago

  • Status changed from Confirmed to In Progress

#4 Updated by intrigeri over 1 year ago

Applied in changeset aafdf8da72df5e9ffdada2eabc920325a6e13520.

Great to see someone picking this up! (Wanna assign it to yourself? Otherwise drop the target version: it feels weird to see a ticket for next release without any assignee.)

One comment: this seems to assume all "latest" serials are kept in sync, which is not true. You can very well have an ISO build where each "latest" pointer is resolved to different values. So I guess we need to store each such value, either serializing them all in one single variable with JSON or similar, or with one variable per repo (urgh).

#5 Updated by intrigeri over 1 year ago

  • Target version deleted (Tails_3.5)

We've triaged this during the monthly meeting "Volunteers to handle important tickets flagged for next release, but without assignee" topic. If you're taking responsibility for this, please assign this ticket to yourself and then set whatever target version you want, but the status of a ticket flagged for next release, but without assignee, is too unclear.

#6 Updated by bertagaz over 1 year ago

intrigeri wrote:

We've triaged this during the monthly meeting "Volunteers to handle important tickets flagged for next release, but without assignee" topic. If you're taking responsibility for this, please assign this ticket to yourself and then set whatever target version you want, but the status of a ticket flagged for next release, but without assignee, is too unclear.

Ack. Too bad for #14944. This ticket's priority is 'normal' though.

#7 Updated by intrigeri over 1 year ago

Too bad for #14944.

I don't understand what you mean: we certainly did not imply that you could not work on this if you fancy it or feel responsible for it.

#8 Updated by bertagaz over 1 year ago

intrigeri wrote:

I don't understand what you mean: we certainly did not imply that you could not work on this if you fancy it or feel responsible for it.

Ack, I was confused and surprised to see that change.

One comment: this seems to assume all "latest" serials are kept in sync, which is not true. You can very well have an ISO build where each "latest" pointer is resolved to different values. So I guess we need to store each such value, either serializing them all in one single variable with JSON or similar, or with one variable per repo (urgh).

If by that you mean that in the build process when apt-snapshots-serials prepare-build is run, the latest snapshots are rotated at the same time and thus at least one of the serial returned during the resolving is different from the others, I agree. I did not think about that.

Now I think this scenario is possible but is rarely happening (and hits devel and branches based on it only). The development cost to implement a solution to that is quite high IMO, at least for me. OTOH the proposed patch is fixing the situaion for #14944 for a very huge part of the builds. I have other things to do for 3.5, so I don't think I'll have the time to work on implementing more than what is proposed.

So I think we should merge the proposed patch, which would greatly help for #14944, and see later if we need this much to spend our work time on fixing the problem you're pointing.

Not sure how to proceed...

#9 Updated by intrigeri over 1 year ago

Hi!

One comment: this seems to assume all "latest" serials are kept in sync, which is not true. You can very well have an ISO build where each "latest" pointer is resolved to different values. So I guess we need to store each such value, either serializing them all in one single variable with JSON or similar, or with one variable per repo (urgh).

If by that you mean that in the build process when apt-snapshots-serials prepare-build is run, the latest snapshots are rotated at the same time

No, that's not what I meant (or I misunderstood what you think I meant). Each of the APT archives we maintain snapshots of is updated at different times: the load is spread over the entire day. So when we start an ISO build it's entirely possible that the latest snapshot for the debian one is YYYYMMDD02 while the latest snapshot for the torproject archive is YYYYMMDD01. Whenever this happens, with the proposed patch the 2nd ISO build will fail because APT will try to fetch a snapshot of torproject indices that does not exist yet. And symetrically, if the first build used different "latest" serials for 2+ APT archives, with the proposed patch the second build will necessarily produce a different ISO, while right now there's a good chance it would manage to reproduce the output of the first build.

Clearer?

Now I think this scenario is possible but is rarely happening

Do you still think that with the above clarification? If yes, I'm curious why: according to git log -p config/APT_snapshots.d/, it happened 3 times in the 5 times we froze APT snapshots for a release in 2017. Granted, that's a very small sample; if you have better data, please share.

The development cost to implement a solution to that is quite high IMO, at least for me.

Understood. Let's take this into account. If it may help, I think that I could get this part done painlessly (but I'd rather not handle the integration with Jenkins). Shall I give it a try (as a volunteer)?

OTOH the proposed patch is fixing the situaion for #14944 for a very huge part of the builds.

I have no idea if the proposed patch will fix more problems than it'll introduce regressions. Our daily builds start at a deterministic time (based on a hash of the branch name) so:

  • It will fix the problem for a subset of the branches that currently fail to build reproducibly due to APT snapshots being updated during the first build. This is good.
  • It will break the second ISO build (due to trying to use non-existing snapshots) for another subset of the branches, that currently build reproducibly just fine. That's more false positives.
  • It will successfully build a different second ISO, i.e. fail to build reproducibly for another subset of the branches, that currently build reproducibly just fine: if the first build was using different "latest" serials for 2+ APT archives, the second build will necessarily be different. That's more false positives.

Similarly, for some part of the builds triggered by Git changes it will fix the problem, for some others it will break the second build, and for some other it will make the second build fail to reproduce.

I won't do the maths right now. So either you do the maths yourself, or we simply try and remain ready to quickly revert if the proposed change makes things worse.
Or I implement the serialization thing I've proposed, that shouldn't take me more time than arguing, or doing the maths, or pretending I'm not doing them ;)

What do you think?

I have other things to do for 3.5, so I don't think I'll have the time to work on implementing more than what is proposed.

Fair enough!

#10 Updated by intrigeri over 1 year ago

  • % Done changed from 0 to 10

intrigeri wrote:

The development cost to implement a solution to that is quite high IMO, at least for me.

Understood. Let's take this into account. If it may help, I think that I could get this part done painlessly (but I'd rather not handle the integration with Jenkins). Shall I give it a try (as a volunteer)?

I pushed a draft implementation. I refrained from using Perl and instead did Ruby so that you and anonym understand what's going on and can take ownership of this tiny piece of code if needed in the future. The prepare-build part seems to work just fine. I didn't know what your plan was wrt. storing the serials of the snapshots used for a given build, so for now I've merely added a cat-json action as an example of how it would look like; I'm happy to adjust it to suit whatever interface is required by the integration technique you have in mind; who knows, perhaps calling it with tmp/APT_snapshots.d as an argument would be enough?

Hoping this helps.

#11 Updated by u over 1 year ago

@bertagaz: Is this ticket still relevant? If yes, can you assign it to yourself or the relevant person? Thanks!

#12 Updated by intrigeri about 1 year ago

  • Tracker changed from Feature to Bug
  • Subject changed from Add option to specify which APT snapshot serial to use during build to Some topic branches are never built reproducibly during Jenkins daily runs => add option to specify which APT snapshot serials to use during build

#13 Updated by intrigeri about 1 year ago

  • Description updated (diff)

#14 Updated by intrigeri about 1 year ago

  • Category changed from Build system to Continuous Integration
  • Assignee set to intrigeri
  • Parent task deleted (#5630)
  • Feature Branch set to bugfix/15107-specify-APT-snapshots-serial-to-build-with

#16 Updated by intrigeri 5 months ago

intrigeri wrote:

I pushed a draft implementation. I refrained from using Perl and instead did Ruby so that you and anonym understand what's going on and can take ownership of this tiny piece of code if needed in the future. The prepare-build part seems to work just fine. I didn't know what your plan was wrt. storing the serials of the snapshots used for a given build, so for now I've merely added a cat-json action as an example of how it would look like; I'm happy to adjust it to suit whatever interface is required by the integration technique you have in mind; who knows, perhaps calling it with tmp/APT_snapshots.d as an argument would be enough?

Here's an implementation plan. It might not be the nicest approach but I've picked it because it only requires changes in tails.git, so it should be easy to try out on a topic branch without impacting any other branch.

During the first build (build_Tails_ISO_*), after figuring out which APT snapshots shall be used (apt-snapshots-serials prepare-build), run echo "APT_SNAPSHOT_SERIALS='$(apt-snapshots-serials-cat-json tmp/APT_snapshots.d)'" >> tails-build-env.list. Later, once the build has succeeded, the collect_build_environment Jenkins builder will run puppet-tails:files/jenkins/slaves/isobuilders/collect_build_environment that will append other variables to this file.

Then the reproducibly_build_Tails_ISO job already loads tails-build-env.list so rake and auto/build should do the right thing, i.e. pass that inherited chunk of JSON to apt-snapshots-serials prepare-build.

#17 Updated by intrigeri 5 months ago

  • QA Check set to Ready for QA

(Forcing testing of build reproducibility.)

#18 Updated by intrigeri 5 months ago

To test this, we need to start builds at a time that will trigger the failure condition:

  • baseline i.e. reproduce the problem: start a build of the devel branch a few minutes before 05:15, 11:15, 17:15, or 23:15 (UTC)
  • validate the fix: start a build of this branch a few minutes before 05:15, 11:15, 17:15, or 23:15 (UTC)

The debian-security snapshot update starts at 05:15, 11:15, 17:15, 23:15 UTC. It generally lasts less than an ISO build. Starting the first builds a few minutes before one of these times should be enough to almost guarantee that the 2nd builds will use a different snapshot and fail to reproduce.

#19 Updated by intrigeri 5 months ago

  • Target version set to Tails_3.12
  • % Done changed from 10 to 40

intrigeri wrote:

To test this, we need to start builds at a time that will trigger the failure condition:

  • baseline i.e. reproduce the problem: start a build of the devel branch a few minutes before 05:15, 11:15, 17:15, or 23:15 (UTC)

As expected, https://jenkins.tails.boum.org/view/Tails_ISO/job/reproducibly_build_Tails_ISO_devel/427/artifact/build-artifacts/diffoscope.iso.html could not reproduce its upstream build.

  • validate the fix: start a build of this branch a few minutes before 05:15, 11:15, 17:15, or 23:15 (UTC)

https://jenkins.tails.boum.org/view/Tails_ISO/job/reproducibly_build_Tails_ISO_bugfix-15107-specify-apt-snapshots-serial-to-build-with/1/ did reproduce its upstream build and the build log show that Fixing 'latest' APT snapshots serials to: '{"torproject":"2018091802","debian":"2018100901","debian-security":"2018120203"}' which matches the upstream build's https://jenkins.tails.boum.org/job/build_Tails_ISO_bugfix-15107-specify-apt-snapshots-serial-to-build-with/1/artifact/build-artifacts/tails-build-env.list/*view*/. Same for builds 2 of these two jobs.

And now I've made devel the base branch for this topic branch and will do this test again before submitting for review.

#20 Updated by intrigeri 5 months ago

  • Assignee changed from intrigeri to anonym
  • % Done changed from 40 to 50

intrigeri wrote:

And now I've made devel the base branch for this topic branch and will do this test again before submitting for review.

https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_bugfix-15107-specify-apt-snapshots-serial-to-build-with/6/ (started after the APT snapshots update) successfully reproduced https://jenkins.tails.boum.org/job/build_Tails_ISO_bugfix-15107-specify-apt-snapshots-serial-to-build-with/6/ (that resolved the latest snapshots before the update).

Candidate for 3.12 i.e. merging into devel.

#21 Updated by intrigeri 4 months ago

  • Assignee deleted (anonym)

#22 Updated by lamby 4 months ago

  • Assignee set to lamby

#23 Updated by lamby 3 months ago

  • Checked out feature branch
  • Built with rake build without environment set (see first buildlog)
  • Built and booted fine, etc.
  • Ran:
$ python3 -c "import json, yaml; data = yaml.load(open('tails-amd64-bugfix_15107-specify-APT-snapshots-serial-to-build-with-3.12-20190106T1750Z-ece9232b9d.build-manifest').read()); print('APT_SNAPSHOTS_SERIALS=\'{}\''.format(json.dumps({x: y['reference'] for x, y in data['origin_references'].items()}).replace(' ', '')))" 

.. to get:

APT_SNAPSHOTS_SERIALS='{"debian":"2019010603","debian-security":"2019010603","torproject":"2019010603"}'
  • Then ran a build with:
$ APT_SNAPSHOTS_SERIALS='{"debian":"2019010603","debian-security":"2019010603","torproject":"2019010603"}' rake build
  • This built (see attached second buildlog)
  • I did not see anything in the build log regarding - I was expecting some message that is echoed. I did see that the serials did not change between these builds but that might be timing... Leaving as "_Ready for QA_"

Questions:

  • Am I missing something? I think I am...!
  • I did get a tails-build-env.list file the first build … should this not have the branch name in it?

#24 Updated by intrigeri 3 months ago

  • Assignee changed from intrigeri to lamby

Your test methodology looks good to me.

  • I did not see anything in the build log regarding - I was expecting some message that is echoed.

tl;dr: look at the console output (instead of the .buildlog file) and you should see that message.

Indeed, echo "Fixing 'latest' APT snapshots serials to: '${APT_SNAPSHOTS_SERIALS}'." should print something. This is the case on https://jenkins.tails.boum.org/view/Tails_ISO/job/reproducibly_build_Tails_ISO_bugfix-15107-specify-apt-snapshots-serial-to-build-with/29/consoleFull. But as you've noticed, that echo'ed line does not appear in the .buildlog file (https://jenkins.tails.boum.org/view/Tails_ISO/job/reproducibly_build_Tails_ISO_bugfix-15107-specify-apt-snapshots-serial-to-build-with/29/artifact/build-artifacts/2/tails-amd64-bugfix_15107-specify-APT-snapshots-serial-to-build-with-3.12-20190114T1015Z-ece9232b9d+devel@e0ec60c0e2.buildlog/*view*/). That's because the .buildlog file only captures the output of lb build; for details, git grep BUILD_LOG -- auto.

  • I did get a tails-build-env.list file the first build … should this not have the branch name in it?

Why do you think it should contain the branch name? (Genuine, non-rhetorical question :)

Our Jenkins machinery appends info to this file, when trying to reproduce a 1st build, as a way to forward to the 2nd build the info it needs to build from the same branch/commit/etc., e.g. https://jenkins.tails.boum.org/view/Tails_ISO/job/reproducibly_build_Tails_ISO_bugfix-15107-specify-apt-snapshots-serial-to-build-with/29/artifact/build-artifacts/1/tails-build-env.list/*view*/. Implementation: https://git-tails.immerda.ch/jenkins-jobs/tree/macros/build_Tails_ISO.yaml#n27 calls https://git.tails.boum.org/puppet-tails/tree/files/jenkins/slaves/isobuilders/collect_build_environment. But we don't do that outside of Jenkins. I think that's because so far, we did not try to actively support the "try to reproduce locally a non-release build" usecase, so all the logic and plumbing is implemented only for Jenkins. The only exception IIRC is the APT snapshots logic, which is fully managed in tails.git; that's why I've implemented my branch here in a way that makes the new feature usable both on Jenkins and outside of it. From a design and developer experience PoV, personally I'd like to move towards supporting "try to reproduce this" outside of Jenkins, with something like https://jenkins.tails.boum.org/view/Tails_ISO/job/reproducibly_build_Tails_ISO_bugfix-15107-specify-apt-snapshots-serial-to-build-with/29/artifact/build-artifacts/1/tails-build-env.list/*view*/ specifying what this is (similarly to a buildinfo file I guess). It would make it easier to work on this part of Tails locally without fiddling with our production Jenkins setup, and more generally it would lower our coupling with Jenkins (which makes it easier to use other CI systems in the future if we want). Some day, maybe :)

#25 Updated by lamby 3 months ago

  • Assignee changed from lamby to intrigeri

intrigeri wrote:

  • I did get a tails-build-env.list file the first build … should this not have the branch name in it?

Why do you think it should contain the branch name?

Simply to match the other generated files. But, as you imply, the entire point is to be across builds or otherwise separate from each specific one.

.buildlog file only captures the output of lb build; for details, git grep BUILD_LOG -- auto.

Ah, TIL!

#26 Updated by intrigeri 3 months ago

FTR when invalid JSON is passed, the build fails as expected:

+ [ -n {"torproject":"2019010603","debian":"2019010603","debian-security":"2019010603",} ]
+ echo Fixing 'latest' APT snapshots serials to: '{"torproject":"2019010603","debian":"2019010603","debian-security":"2019010603",}'.
Fixing 'latest' APT snapshots serials to: '{"torproject":"2019010603","debian":"2019010603","debian-security":"2019010603",}'.
+ apt-snapshots-serials prepare-build {"torproject":"2019010603","debian":"2019010603","debian-security":"2019010603",}
/usr/lib/ruby/2.3.0/json/common.rb:156:in `parse': 822: unexpected token at '{"torproject":"2019010603","debian":"2019010603","debian-security":"2019010603",}' (JSON::ParserError)
    from /usr/lib/ruby/2.3.0/json/common.rb:156:in `parse'
    from /usr/lib/ruby/2.3.0/json/common.rb:335:in `load'
    from /tmp/tails-build.K6g1xPRu/auto/scripts/apt-snapshots-serials-load-json:16:in `<main>'

#27 Updated by intrigeri 3 months ago

  • Status changed from In Progress to Fix committed
  • % Done changed from 50 to 100

#28 Updated by intrigeri 3 months ago

  • Assignee deleted (intrigeri)
  • QA Check changed from Ready for QA to Pass

#29 Updated by anonym 3 months ago

  • Status changed from Fix committed to Resolved

Also available in: Atom PDF