Project

General

Profile

Feature #15287

Feature #15281: Stack one single SquashFS diff when upgrading

Make it possible to reproducibly generate IUKs in CI

Added by anonym over 1 year ago. Updated 5 months ago.

Status:
Confirmed
Priority:
Normal
Assignee:
-
Category:
Continuous Integration
Target version:
-
Start date:
02/05/2018
Due date:
% Done:

0%

Feature Branch:
Type of work:
Code
Blueprint:
Starter:
Affected tool:

Description

Since we'll generate a lot more of IUKs each release uploading them will be painful for RMs with slow internet connections.

For example, the following VM would do:
  • 4 virtual CPUs (so the IUK generation won't take too long)
  • 1 GB of RAM (I used to generate IUKs in such a VM two years ago but YMMV)
  • 10 GB /tmp for tails-iuk's needs, and where we store the generated IUKs (before uploading them to rsync.lizard)
  • Access to Jenkins build artifacts

Related issues

Blocks Tails - Feature #16052: Document post-release reproducibility verification for IUKs Confirmed 10/15/2018
Blocks Tails - Feature #16209: Core work: Foundations Team Confirmed

History

#1 Updated by intrigeri over 1 year ago

  • Subject changed from Make it possible to reprodicibly generate IUKs on Lizard to Make it possible to reproducibly generate IUKs on lizard
  • Assignee changed from intrigeri to anonym
  • Type of work changed from Sysadmin to Code

anonym wrote:

For example, the following VM would do:

Our isobuilders should be fine then.

  • Access to Jenkins build artifacts

Our isobuilders have access to the ISO archive so the easiest way is to push the new ISO there before generating the IUKs.

There's no chance I do all this work myself during the 3.6 cycle. I could deploy a new Jenkins job if someone else write the code that will build the needed IUKs and tells me how this job should behave (input, output, how it'll be triggered).

#2 Updated by anonym over 1 year ago

  • Assignee changed from anonym to intrigeri
  • QA Check set to Info Needed

intrigeri wrote:

I could deploy a new Jenkins job [...]

Is Jenkins the right tool here? As a RM, I see no benefit. To me, the ideal would be that all RMs got access to some VM on Lizard fulfilling the above criteria, and that I write some shell script in the release process that simply is copy-pasted into a terminal to do the IUK building in said VM over SSH. This way I can much more easily debug and find workarounds if there's problems.

#3 Updated by intrigeri over 1 year ago

  • Assignee changed from intrigeri to anonym
  • QA Check changed from Info Needed to Dev Needed

intrigeri wrote:

I could deploy a new Jenkins job [...]

Is Jenkins the right tool here? As a RM, I see no benefit. To me, the ideal would be that all RMs got access to some VM on Lizard fulfilling the above criteria, and that I write some shell script in the release process that simply is copy-pasted into a terminal to do the IUK building in said VM over SSH.

I'm surprised that you think adapt+copy'n'pasted manual build steps can be better than automated builds. I disagree and feel the need to argue in favour of automation. With an automated build system implemented e.g. as a Jenkins job:

  • Better handling of failures: a Jenkins "failed" status is more obvious that a non-zero exit code that your shell may not warn you about (and even if it does, the RM may miss it when it's 2am and they're trying to finish a part of the release process before going to bed).
  • We have to make the whole thing truly automatic modulo some well-defined parameters. One can't tell the same with shell script snippets from our release process doc, that often need adapting, which requires the RM to reason correctly and without mistakes. In practice it follows that:
    • RMs tend to make mistakes, especially when they either do the task too often and thus occasionally stop thinking (you) or when they do the task not often enough to fully misunderstand the instructions (bertagaz or myself). In this case it particularly matters because the exact same build must be done twice (once locally, once on lizard).
    • RMs tend to adapt/fix such instructions locally without improving the doc. We've seen cases where neither you nor I ever followed the release process doc to the letter (e.g. in terms of ordering) in the real world, and then when the occasional RM tries to follow the doc, guess what, we notice it was never tested and can't possibly work. Anything that leaves room for such creativity (let's be nice with ourselves :) tends to be create a gap between theory and practice. With a Jenkins job, we can be 100% certain that the build was done as designed and documented, that it works, and this increases the chances it'll work next time the occasional RM prepares a release.
  • Build artifacts and logs are stored, tracked and published. This gives us an audit trail in case something goes wrong. That audit trail can be inspected by other Tails developers who can help fix the problem, compared to your terminal window. This also makes it easier to reproduce problems because we know what exact code was run when the problem happened.
  • We get something consistent with how we build and publish the released ISO (see MATCHING_JENKINS_BUILD_ID=XXX in the release process doc).
  • We're getting a little bit closer to CI. Adding manual adapt'n'copy'n'paste shell scripts does exactly the opposite.

More generally, most arguments in favour of automating builds & releases (i.e. CI) work here. I guess I don't need to tell you about them :)

I'm open to not block on this for the initial implementation of #15281 but I would be unhappy if it remained done manually for too long; we're too good at postponing stuff to the famous second iteration™ that never happens. So I'd like the manual solution you propose to be implemented in a way that naturally leans towards automation: e.g. a program, living in jenkins-tools.git, with a clear interface, that explicitly gets any input it cannot guess as parameters, and that exits with sensible exit codes. Even without running it on Jenkins it'll already address some of the issues with the copy'n'paste approach that I listed above.

This way I can much more easily debug and find workarounds if there's problems.

As long as you have write access to the code that this Jenkins job would run and it's deployed when you push without requiring a sysadmin to do anything, I don't see a huge difference but I see what you mean: there's one more level of indirection between you (as the RM) and the code that runs. My counter argument is that the manual approach you're advocating for makes it harder for anyone else to "debug and find workarounds if there's problems".

#4 Updated by anonym over 1 year ago

Wow, I honestly feel dumb and embarrassed for my comment above: as you seem to have caught on to, I have recently had a few episodes of "drowning in abstractions/indirection/layers/blah" while debugging fundamentally simply problems, which I think overwhelmed me, and caused me to react defenively. Thanks for nicely articulating some timely reminders of why things are the way they are for overall good reasons! :)


intrigeri wrote:

I'm open to not block on this for the initial implementation of #15281 but I would be unhappy if it remained done manually for too long; we're too good at postponing stuff to the famous second iteration™ that never happens. So I'd like the manual solution you propose to be implemented in a way that naturally leans towards automation: e.g. a program, living in jenkins-tools.git, with a clear interface, that explicitly gets any input it cannot guess as parameters, and that exits with sensible exit codes. Even without running it on Jenkins it'll already address some of the issues with the copy'n'paste approach that I listed above.

Fully agreed!

This way I can much more easily debug and find workarounds if there's problems.

As long as you have write access to the code that this Jenkins job would run and it's deployed when you push without requiring a sysadmin to do anything, I don't see a huge difference but I see what you mean: there's one more level of indirection between you (as the RM) and the code that runs.

Yes, this is an actual concern that affects me. It's another thing like the tagged/time-based APT snapshot system -- I'm able to fix about half the issues I encounter, but for the tricky stuff I often end up urgently needing your help close to release time. That's pretty stressful, which there is enough of at that point in time any way. I think a good enough remedy is to have you "on-call" for dealing with such problems for a few releases (incl. RCs, but less urgently) when deploying this -- under what terms is that possible, if at all?

#5 Updated by intrigeri over 1 year ago

This way I can much more easily debug and find workarounds if there's problems.

As long as you have write access to the code that this Jenkins job would run and it's deployed when you push without requiring a sysadmin to do anything, I don't see a huge difference but I see what you mean: there's one more level of indirection between you (as the RM) and the code that runs.

Yes, this is an actual concern that affects me. It's another thing like the tagged/time-based APT snapshot system -- I'm able to fix about half the issues I encounter, but for the tricky stuff I often end up urgently needing your help close to release time.

I have no data I could check about such situations but my feeling is that in these tricky cases, the kind of help you need is about helping you understand fine details of how the system works in corner cases so either you can workaround/fix our stuff to avoid hitting corner cases, or I will make our code handle such corner cases better. I doubt that running the code locally vs. remotely would make a big difference: without that understanding of these fine details, even if you could run/debug the code locally, you would sometimes not be in a position to decide what's a suitable fix. I think it'll be just the same for generating IUKs unless you learn enough Modern Perl and dive deep enough into our incremental upgrades design+implementation to be fully autonomous in this area, which IMO has a rather bad cost/benefit for Tails. Anyway, I don't have data to back this feeling and I suspect you don't either, so let's leave it at that given:

That's pretty stressful, which there is enough of at that point in time any way.

This I totally understand and I want to take it into account!

I think a good enough remedy is to have you "on-call" for dealing with such problems for a few releases (incl. RCs, but less urgently) when deploying this -- under what terms is that possible, if at all?

I don't understand why this would be needed specifically for the Jenkins deployment: as long as the RM can fallback to running/debugging/fixing the script locally, even if the Jenkins job does not do what the RM needs we're good, no? Or were you asking even for the case when Jenkins is not involved and the RM runs the script locally?

#6 Updated by intrigeri over 1 year ago

  • Target version changed from Tails_3.6 to Tails_3.7

#7 Updated by intrigeri over 1 year ago

  • Target version changed from Tails_3.7 to Tails_3.8

#8 Updated by intrigeri over 1 year ago

Next step: specify the dependencies, input and output of the script. Leaving this on anonym's plate for now but I could take over this step if it's one task too many for you.

Once we have this we can:

  • find someone to implement it (I'm thinking of our new FT colleagues)
  • design the Jenkins job that will run this script (e.g. it might be that the script's input includes info that's too hard for a program to guess, and then the job will need whoever runs it to fill some parameters that'll be converted to input for the script)

#9 Updated by intrigeri over 1 year ago

  • Target version changed from Tails_3.8 to Tails_3.10.1

#10 Updated by intrigeri about 1 year ago

  • Target version changed from Tails_3.10.1 to Tails_3.11

#11 Updated by intrigeri about 1 year ago

#12 Updated by intrigeri about 1 year ago

  • Assignee changed from anonym to intrigeri

#13 Updated by intrigeri 11 months ago

  • Blocks Feature #16052: Document post-release reproducibility verification for IUKs added

#14 Updated by intrigeri 11 months ago

  • Target version changed from Tails_3.11 to Tails_3.12

#15 Updated by intrigeri 11 months ago

  • Target version changed from Tails_3.12 to Tails_3.13

#16 Updated by intrigeri 10 months ago

#17 Updated by intrigeri 10 months ago

#18 Updated by intrigeri 8 months ago

  • Target version changed from Tails_3.13 to 2019

#19 Updated by intrigeri 8 months ago

#20 Updated by intrigeri 8 months ago

#21 Updated by intrigeri 7 months ago

  • Target version deleted (2019)

This is not on our roadmap.

#22 Updated by intrigeri 6 months ago

  • Assignee deleted (intrigeri)
  • QA Check deleted (Dev Needed)

#23 Updated by intrigeri 5 months ago

  • Subject changed from Make it possible to reproducibly generate IUKs on lizard to Make it possible to reproducibly generate IUKs in CI

What matters in not particularly that this is done on Jenkins, it's that this is done on a machine from which lizard can quickly download a big pile of IUKs.

Also available in: Atom PDF