Project

General

Profile

Feature #7221

Feature #8617: Regularly delete obsolete Git branches and corresponding data in our APT repository

Write a script that deletes old merged Git branches

Added by intrigeri over 5 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Infrastructure
Target version:
Start date:
Due date:
% Done:

100%

Feature Branch:
feature/8617-delete-obsolete-Git-branches
Type of work:
Code
Blueprint:
Starter:
Yes
Affected tool:

Description

Our main Git repo already has 400+ branches, and it's growing fast. Each branch results in a dedicated APT suite (that is, most of the time, empty). Keeping all those branches will also make the process of detecting active ones (form #6196) slower.

We should detect, and delete, at least Git branches that were merged into a released tag. We already have code in auto/scripts/tails-custom-apt-sources, in the main Tails Git repository, to detect released tags. That could be reused somehow. An initial implementation might copy'n'paste the logic or the code, but in the end we should factor this out into a library or set of small utility scripts.

Once we have a script that can do that, we can run it once to start with, and then maybe periodically (e.g. at post-release time).

Regarding the implementation language: if the one who undertakes this mission qualifies as an expert at writing robust, secure and elegant POSIX shell, and they want to do it this way, fine. Else, better go directly for (Modern) Perl or Python. Ruby lovers can certainly pick their language of choice too, but they'd better be careful about compatibility with future Ruby versions... we spend too much time already updating our existing Ruby code to 1.9.

Sources of inspiration:


Related issues

Related to Tails - Feature #7036: Move custom software to main git repo Confirmed
Blocks Tails - Feature #6196: Build all active branches Resolved 08/07/2013 07/15/2015

Associated revisions

Revision 136fe795 (diff)
Added by intrigeri over 4 years ago

Mention the need to init+update submodules after cloning the main Git repo.

Refs: #7221

Revision 05b715a8 (diff)
Added by intrigeri over 4 years ago

Add some documentation and pointers about Git submodules.

Refs: #7221

History

#1 Updated by intrigeri over 5 years ago

#2 Updated by intrigeri over 5 years ago

  • Target version set to Sustainability_M1

Flagging for 2.0, as it blocks #6196, which is on the 2.0 roadmap. Well, it does not exactly blocks it, but writing the code for #7221 will ease resolving #6196 a great deal.

#3 Updated by hyas about 5 years ago

Non-blocking ticket for 2.0

#4 Updated by intrigeri about 5 years ago

  • Subject changed from Write a script that deletes old Git branches to Write a script that deletes old merged Git branches

#5 Updated by intrigeri about 5 years ago

  • Related to Feature #6277: Rewrite the tails-live.git history added

#6 Updated by intrigeri about 5 years ago

  • Related to deleted (Feature #6277: Rewrite the tails-live.git history)

#7 Updated by intrigeri about 5 years ago

#8 Updated by intrigeri about 5 years ago

  • Description updated (diff)

#9 Updated by intrigeri about 5 years ago

  • Description updated (diff)

#10 Updated by intrigeri over 4 years ago

  • Parent task set to #8617

#11 Updated by intrigeri over 4 years ago

  • Assignee set to Dr_Whax

#12 Updated by intrigeri over 4 years ago

  • Blocks Feature #8618: Deal with the remaining branches' Git history and interdependencies added

#13 Updated by intrigeri over 4 years ago

  • Blocks Feature #8619: Do a trial rewrite of all branches and tags added

#14 Updated by intrigeri over 4 years ago

  • Target version changed from Sustainability_M1 to Tails_1.3.2

#16 Updated by intrigeri over 4 years ago

  • Blocks deleted (Feature #6277: Rewrite the tails-live.git history)

#17 Updated by intrigeri over 4 years ago

  • Blocks deleted (Feature #8619: Do a trial rewrite of all branches and tags)

#18 Updated by intrigeri over 4 years ago

  • Blocks Feature #8619: Do a trial rewrite of all branches and tags added

#19 Updated by Dr_Whax over 4 years ago

  • Status changed from Confirmed to In Progress
  • Assignee changed from Dr_Whax to intrigeri
  • QA Check set to Ready for QA
  • Feature Branch set to drwhax:feature/8617-delete-obsolete-Git-branches

#20 Updated by intrigeri over 4 years ago

  • Assignee changed from intrigeri to Dr_Whax
  • % Done changed from 0 to 20

Thanks! I trust this script will work fine for most cases.

We need it to be very robust, as it's doing destructive operations, though. I was hoping that my "Regarding the implementation language [...]" note in the ticket description would have avoided us going through that, and now we have a PoC shell script without much error handling, and that has most common issues that 99.9999% of shell scripts have. This review will be up to the expectations I had clearly set.

Oh well, I guess we'll live with it once what follows is addressed:

  • No more shell scripts without set -e and set -u, please.
  • Please quote shell variables everywhere unless it would break something. E.g. in [ $REMOVE_FROM_REMOTES -eq 1 ].
  • Please make MANAGED_REMOTES default to something that the RMs are more likely to use, that is origin.
  • I'm surprised to see head, not HEAD, in BRANCHES_TO_KEEP: I see a origin/HEAD ref in our official repo, but no head.
  • generate's documentation doesn't make it clear enough that it only handles one-space delimited strings. It's name is also much too generic.
  • The is not a Git tree message should be sent to STDERR.
  • Quite a few matching strings in sub-regexps are captured with parenthesis for reasons I don't get. I've found it confusing on first read.
  • The code seems to assume that local master branch is in the same state as the master branch in the official repo. IMO we should not rely on it. I suggest checking out (possibly in a detached state) the just-released tag instead.
  • Why are there && between calls to echo? It feels weird (and a bit scary) that it's the only place where there's error handling in this script. Anyway, it will be useless once you set -e so you can just drop it :)
  • Please replace Y/N with y/N to make it clear what the default choice is.
  • Please use extended regexp syntax consistently: the script uses it for grep (which is good) but not for sed (which is painful to read, at least for me, and confusing since we have similar regexps written in different ways in the same script).
  • Iterating over things called "$BRANCH", that are actually not branch names, but instead "something with possibly leading space and a remote name followed by a slash and the actual branch name" doesn't make the code very clear, to say the least. Please:
    • call branch names that include the remote name e.g. REMOTE_BRANCH
    • make sure you do not store these leading spaces in REMOTE_BR nor in LOCAL_BR
    • adjust the rest of the code to take into account the fact that the elements in these lists do not have leading whitespace
  • It would make the code way clearer for me if some of the sub-string extraction logic was moved to functions with proper names: I find it scary that the most important part of this code is done by git push $(echo $BRANCH | sed 's/\([^/]\+\)\/\(.\+\)/\1 :\2/'); maybe you know what you're doing, but what about the next one who has to touch this code?
  • Using / as the separator for string substitution in sed makes the regexp painful to read when the strings we're handling have slashes in them, which is precisely the case here.
  • Why do we filter out BRANCHES_TO_KEEP again when deleting local branches, while it's supposedly done already when initializing LOCAL_BR?
  • Please drop the .sh extension, for consistency with all the scripts we have at the root of the Git directory (that I plan to move to bin/ to at a later point).
  • fi is wrongly indented on line 64 (at commit:fb7a57c).
  • I see you've implemented the xargs trick we've discussed earlier for local branches only. Fine, but I had requested it for remote branches, since that's where network latency (and operations triggered by pushing) can be a problem.
  • Please make the script executable in Git.
  • Fetching from all remotes is very slow for me. Why do we need to do that?

#21 Updated by intrigeri over 4 years ago

Pushed a couple changes to feature/8617-delete-obsolete-Git-branches in the official repo => please merge from there before working on it again. Thanks!

#22 Updated by intrigeri over 4 years ago

  • QA Check changed from Ready for QA to Dev Needed

#23 Updated by kytv over 4 years ago

  • Assignee changed from Dr_Whax to intrigeri
  • % Done changed from 20 to 0
  • QA Check changed from Dev Needed to Ready for QA

intrigeri wrote:

Thanks! I trust this script will work fine for most cases.

We need it to be very robust, as it's doing destructive operations, though. I was hoping that my "Regarding the implementation language [...]" note in the ticket description would have avoided us going through that, and now we have a PoC shell script without much error handling, and that has most common issues that 99.9999% of shell scripts have. This review will be up to the expectations I had clearly set.

I went for a "quick'n'dirty" (perhaps too quick and too dirty) shell script for my personal use. I doubt that I spent more than 10-20 minutes on this thing and it likely shows. I didn't go with (modern (or ancient)) Perl, Python, or Ruby since my skills there are lacking. I just wanted to clean my repositories. :)

(I'm tempted to redo this in Ruby as a skill building exercise...and tbh I'd not be least bit offended if this was discarded and completely rewritten.)

#24 Updated by kytv over 4 years ago

  • Assignee changed from intrigeri to Dr_Whax
  • % Done changed from 0 to 20
  • QA Check changed from Ready for QA to Dev Needed

Sorry, don't know why the statuses were changed.

#25 Updated by intrigeri over 4 years ago

I went for a "quick'n'dirty" (perhaps too quick and too dirty) shell script for my personal use.

Actually, I think you wrote an initial version before I added the note about choice of implementation language in the ticket's description (without having read your script yet, so this was unrelated; and no idea how much it has been changed by DrWhax since then).

#26 Updated by Dr_Whax over 4 years ago

I didn't really change it, but that might change soon, it's a good exercise in shell script :)

#27 Updated by kytv over 4 years ago

intrigeri wrote:

I went for a "quick'n'dirty" (perhaps too quick and too dirty) shell script for my personal use.

Actually, I think you wrote an initial version before I added the note about choice of implementation language in the ticket's description (without having read your script yet, so this was unrelated; and no idea how much it has been changed by DrWhax since then).

What I wrote is old and was done before I knew this ticket existed, or that Tails wanted something like it. When I saw this, "hmm..maybe it could be reworked for it... maybe ...or maybe it could serve some sort of inspiration", so I sent it.

I can say that the majority of shell scripts to do this relatively simple task are horribly buggy. OK, maybe they're not as bad as a certain gaming software doing the equivalent of rm -rf /* but they're so poorly written!

Dr_Whax wrote:

didn't really change it, but that might change soon, it's a good exercise in shell script :)

Woot

#28 Updated by intrigeri over 4 years ago

  • Blocks deleted (Feature #8619: Do a trial rewrite of all branches and tags)

#29 Updated by intrigeri over 4 years ago

  • Blocks deleted (Feature #8618: Deal with the remaining branches' Git history and interdependencies)

#30 Updated by Dr_Whax over 4 years ago

  • Target version changed from Tails_1.3.2 to Tails_1.4

#31 Updated by intrigeri over 4 years ago

DrWhax, what ETA would seem realistic to you on this front?

#32 Updated by intrigeri over 4 years ago

  • Assignee changed from Dr_Whax to intrigeri

I'll take it. My plan is to use the Git class from the Python library we're writing in #8657, extend it to have the bits we need here, and rewrite the script in Python 3 for better robustness, code clarify, and code sharing with e.g. #8656.

So, this is currently blocked by a design decision that needs to be made and implemented, see #8657#note-26.

#33 Updated by intrigeri over 4 years ago

  • % Done changed from 20 to 30
  • QA Check deleted (Dev Needed)
  • Feature Branch changed from drwhax:feature/8617-delete-obsolete-Git-branches to feature/8617-delete-obsolete-Git-branches

#34 Updated by intrigeri over 4 years ago

  • Related to Feature #7036: Move custom software to main git repo added

#35 Updated by intrigeri over 4 years ago

  • Assignee deleted (intrigeri)
  • % Done changed from 30 to 50
  • QA Check set to Ready for QA

#36 Updated by intrigeri over 4 years ago

  • Assignee set to anonym

Explicitly assigning to the RM, since I doubt anyone will dare taking this one.

#37 Updated by anonym over 4 years ago

  • Assignee changed from anonym to intrigeri
  • QA Check changed from Ready for QA to Info Needed

intrigeri wrote:

Explicitly assigning to the RM, since I doubt anyone will dare taking this one.

All looks very nice and simple, including the library. Code review passed! Congrats! I did a test run (without pushing... to be sure I commented out that code :)), diff@ed the outputted list of branches against the full list of branches (after some massaging) and it looked very reasonable given what currently is in our @master branch. For a more familiar, up-to-date view I tried switching to the testing branch in branches_to_delete(), which also looked good.

However, now that submodules are introduced, isn't a step doing git submodule foreach update (or similar) missing for ensuring that they are up-to-date (to the commit in Tails' Git) at the correct times? Currently that's only needed at post-release time.

#38 Updated by intrigeri over 4 years ago

However, now that submodules are introduced, isn't a step doing git submodule foreach update (or similar) missing for ensuring that they are up-to-date (to the commit in Tails' Git) at the correct times?

Yes, indeed something like this should be added. I'll test stuff and propose a patch against release notes by tomorrow, when you'll need it.

Note that what we want to do at that point is making sure that the correct commit in the submodule, as referenced by the branch in the main repo, is checked out. We do not want to pull in code from the submodule, that's newer than the commit referenced in the main repo. Doing that requires testing and going through the review'n'merge, just like when we want to merge new code into e.g. Tails Installer.

#39 Updated by anonym over 4 years ago

intrigeri wrote:

However, now that submodules are introduced, isn't a step doing git submodule foreach update (or similar) missing for ensuring that they are up-to-date (to the commit in Tails' Git) at the correct times?

Yes, indeed something like this should be added. I'll test stuff and propose a patch against release notes by tomorrow, when you'll need it.

Ok. Then I omit it from the 1.4 changelog.

Note that what we want to do at that point is making sure that the correct commit in the submodule, as referenced by the branch in the main repo, is checked out. We do not want to pull in code from the submodule, that's newer than the commit referenced in the main repo. Doing that requires testing and going through the review'n'merge, just like when we want to merge new code into e.g. Tails Installer.

Of course. I thought that was what git submodule update did, and what I tried to say with "to the commit in Tails' Git".

#40 Updated by intrigeri over 4 years ago

  • Assignee changed from intrigeri to anonym
  • Target version changed from Tails_1.4 to Tails_1.4.1
  • % Done changed from 50 to 60
  • QA Check changed from Info Needed to Ready for QA

anonym wrote:

Of course. I thought that was what git submodule update did, and what I tried to say with "to the commit in Tails' Git".

Cool, we're on the same page then :)

Added some documentation. I decided to put it in contribute/git since:

  • that's not release-time specific at all (even though currently all non-RMs can safely ignore said new documentation in practice)
  • it is more generic and thus nicely paves the way for #7036.

Postponing to 1.4.1 since I'd rather not see this merged before the 1.4 release is completed (would be sad if it broke Jenkins right now). Still, I would appreciate if it could be merged into stable (for 1.4.1) and devel shortly after 1.4 is out, so that I have time to adjust whatever, e.g. Jenkins, that might be broken by this change.

#41 Updated by anonym over 4 years ago

  • Status changed from In Progress to Fix committed
  • Assignee deleted (anonym)
  • % Done changed from 60 to 100
  • QA Check changed from Ready for QA to Pass

#42 Updated by intrigeri about 4 years ago

  • Status changed from Fix committed to Resolved

Also available in: Atom PDF