Project

General

Profile

Bug #15402

Feature #10034: Translation web platform

Feature #15079: Integrate the platform with our Git and ikiwiki infrastructure

Feature #15082: Have the Weblate Git communicate with our main Git repository

Pre-receive hook for Main git when handling pushs from weblate

Added by u about 1 year ago. Updated 1 day ago.

Status:
In Progress
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
Due date:
% Done:

0%

QA Check:
Pass
Feature Branch:
https://salsa.debian.org/hefee/puppet-tails/tree/bugfix/15402-update-hook-for-weblate-pushes
Type of work:
Sysadmin
Blueprint:
Starter:
Affected tool:

Description

This scripts is the security barrier into maingit. It is triggerd by the bare git hook mechanism.
It is placed inside puppet-tails: files/gitolite/hooks/tails-weblate-update.hook

Envrironment:
  • the script is triggered for every push to tails.git
  • a malicious users try to trick this script
  • the translation-server may be compromised and the weblate user may pushes malicious commits.
  • GL_USER is a environment variable, that is set by gitolite and is safe to rely on and indicates the users that it pushing.
  • If the script returns with a status code 0 the push is allowed and not 0 if not allowed.
  • Any output to stdout/stderr is allowed but only displayed to the user.
Expected outcome:
  • Weblate need to use "weblate <>" as committer "name <email>" in any case
  • Weblate is ONLY to push po files nothing else in any case.
  • Everything else is not allowed for weblate.
  • For all other users the script should not do anything.

test-tails-weblate-update - Sample split test suite (3.49 KB) enrico, 04/22/2019 10:00 AM


Subtasks

Bug #16712: Run tests for tails-weblate-update.hookConfirmedintrigeri

Bug #16761: Move non wiki related files out of wiki/src subdirConfirmed


Related issues

Related to Tails - Bug #15185: Automatically update Weblate components on mdwn file add/rename/delete in tails.bo git Resolved 02/27/2019
Related to Tails - Bug #15401: Create hooks to be implemented on the Weblate instance Rejected 03/13/2018
Related to Tails - Bug #16760: remove executable bit for po files Confirmed

History

#1 Updated by u about 1 year ago

  • Related to Bug #15185: Automatically update Weblate components on mdwn file add/rename/delete in tails.bo git added

#2 Updated by u about 1 year ago

  • Description updated (diff)

#3 Updated by u about 1 year ago

  • Subject changed from Post merge hook for Weblate to Pre-receive hook for Main git when handling pushs from weblate
#!/usr/bin/env python
# Pre-receive or update hook for main git
# Weblate is only allowed to push po files.

from subprocess import call

expected = "tails-l10n@boum.org" 
regex = "/\*\.po\$/" 

def check_commit_author_weblate():
    rev_list = subprocess.run("git rev-list #{$oldrev}..#{$newrev}", shell=True, check=True)
    rev_list = rev_list.split("\n")
    for rev in rev_list:
        # get committer email
        committer = subprocess.run("git cat-file commit #{rev} | grep 'committer' | grep -o -P '(?<=\<).*(?=\>)'")
        files = subprocess.run("git diff-tree --no-commit-id --name-only -r #{rev}")
        if expected == committer and not regex.search(files)
            print "[POLICY] Weblate is only allowed to modify po files." 
            return False
check_commit_author_weblate()

For now, the criteria to check against a commit author is very weak. This should be completed by some environment variable when pushing over SSH -> this would ensure we do this check only when weblate pushes, not somebody else.

#4 Updated by bertagaz about 1 year ago

  • Target version changed from Tails_3.7 to Tails_3.8

#5 Updated by u 12 months ago

  • Assignee changed from u to groente
  • QA Check set to Info Needed

Hi groente, as our preferred rubber duck for this project, here is a question for you:

How can we ensure to check from the production machine that a push over SSH is indeed done by Weblate and not anybody else? We want to ensure that Weblate only pushed po modifications and nothing else.

I wrote the above script with the criteria to check against a commit author, but I'm totally aware that this check is very weak, if not useless. The only other way we could think of is to check for an environment variable. But maybe rubber ducks have better ideas here?

#6 Updated by intrigeri 11 months ago

  • Target version changed from Tails_3.8 to Tails_3.9

#7 Updated by groente 11 months ago

  • Assignee changed from groente to u
  • QA Check changed from Info Needed to Dev Needed

Heya, so after testing with printenv as a git-hook, i found out that gitolite-shell seems to store the username associated with the key that was used to authenticate in an environment variable called GL_USER.
Checking whether this variable matches the desired username would be a lot safer than checking the commit author name.

#8 Updated by u 9 months ago

Won't be able to do this in the next 48 hours (release of 3.9)

#9 Updated by u 9 months ago

  • Target version changed from Tails_3.9 to Tails_3.10.1

#10 Updated by intrigeri 7 months ago

  • Target version changed from Tails_3.10.1 to Tails_3.11

#11 Updated by u 7 months ago

  • Assignee changed from u to hefee

groente wrote:

Heya, so after testing with printenv as a git-hook, i found out that gitolite-shell seems to store the username associated with the key that was used to authenticate in an environment variable called GL_USER.
Checking whether this variable matches the desired username would be a lot safer than checking the commit author name.

@groente, thanks! We'll look into it.

#12 Updated by hefee 7 months ago

  • Related to Bug #15401: Create hooks to be implemented on the Weblate instance added

#13 Updated by hefee 6 months ago

  • Status changed from Confirmed to In Progress
  • Assignee changed from hefee to groente
  • QA Check changed from Dev Needed to Ready for QA
  • Feature Branch set to https://salsa.debian.org/hefee/puppet-tails/tree/bugfix/15402-weblate-hook
I now checked the script and add it puppet-tails. Locally I checked following scenarios:
  • tries to push commits as other committer
  • tries to update non po file
  • tries to delete a po file
  • tries to create a new branch

I install this hook as update hook, because pre-receive hook had not the needed information oldrev and newrev. It may be different, if installed on server.

#14 Updated by groente 6 months ago

  • Assignee changed from groente to bertagaz
  • % Done changed from 0 to 50
  • Feature Branch changed from https://salsa.debian.org/hefee/puppet-tails/tree/bugfix/15402-weblate-hook to puppet-tails:bugfix/15402-update-hook-for-weblate-pushes

hey bertagaz,

i've reviewed hefee's script, which looks good to me, can you check if i forgot about anything in the deploy? thank you!

#15 Updated by intrigeri 6 months ago

I took a very quick first look and would like to do a proper security-focused review once bertagaz is happy with the code: this is a security critical check (so far we never allowed a random webapp to push changes to our website, guarded by custom-written code based on regexps) so the more eyes, the better.

reModifyFile = re.compile("[^\t]* M\t.*\.po")
[…]
+            if not reModifyFile.match(line):
<pre>

Is it only me or will this allow pushing changes to any file whose name _contains_ @.po@? We might want @"[^\t]* M\t.*\.po$"@ instead.

#16 Updated by groente 6 months ago

  • Assignee changed from bertagaz to hefee
  • QA Check changed from Ready for QA to Dev Needed

intri does raise a good point, that regex atleast needs a trailing $ to prevent weblate from modifying any file which happens to have .po in it (.pools, .policy, etc).

#17 Updated by hefee 6 months ago

  • Assignee changed from hefee to bertagaz
  • QA Check changed from Dev Needed to Ready for QA
  • Feature Branch changed from puppet-tails:bugfix/15402-update-hook-for-weblate-pushes to https://salsa.debian.org/hefee/puppet-tails/tree/bugfix/15402-update-hook-for-weblate-pushes

groente wrote:

intri does raise a good point, that regex atleast needs a trailing $ to prevent weblate from modifying any file which happens to have .po in it (.pools, .policy, etc).

ACK - fixed. I thoiught that re.match does check, that the complete string matches to content, in contrast to re.search. But the only difference is, that it searches from the beginning.

Another thing I found is:

if newrev == "0000000000000000000000000000000000000000": 
sys.exit("[POLICY] Weblate is not allowed to delete branches.")

Maybe also replace this long 0 with a regex ^0*$, don't know if the length is fixed.

Do we also want to close the allowed refname updates, as it is done in templates/gitolite/hooks/tails-post-receive.erb

#18 Updated by intrigeri 6 months ago

Do we also want to close the allowed refname updates, as it is done in templates/gitolite/hooks/tails-post-receive.erb

We can declaratively configure Gitolite to only allow the weblate user to push to the master branch ⇒ no need to write new code for this.

#19 Updated by CyrilBrulebois 5 months ago

  • Target version changed from Tails_3.11 to Tails_3.12

#20 Updated by intrigeri 5 months ago

  • Parent task changed from #15079 to #15082

#21 Updated by u 4 months ago

  • Assignee deleted (bertagaz)

#22 Updated by intrigeri 4 months ago

  • Target version changed from Tails_3.12 to Tails_3.14

#23 Updated by intrigeri 4 months ago

  • Target version changed from Tails_3.14 to Tails_3.13

(Actually, it would be nice if a first security-oriented review was done sufficiently in advance before our sprint.)

#24 Updated by u 4 months ago

  • Assignee set to enrico

I propose that this can have a first round of review by enrico for security issues. When done, it can be reassigned to intrigeri.

#25 Updated by hefee 4 months ago

  • Description updated (diff)

I have upded the description of this ticket so encrio should now have enough context to do a security review of this script.

#26 Updated by enrico 3 months ago

  • Assignee changed from enrico to hefee

Hello,

here are my comments on the tails-weblate-update.hook script. Thanks for the
detailed introduction, it had all I needed to get this going!

I have some structural questions, before looking at the code in detail.

The script currently uses git cat-file and git diff-tree to get committer
name and files changed, and does quite a bit of parsing on their output. I
would suggest to ask git for a machine readable version of what we need,
with something like git show --format="%ce" --name-status <shasum>, to
minimize the amount of parsing on our side that might go wrong.

Alternatively, could gitpython be used instead of running git with subprocess?
Then we wouldn't need to bother with writing parsing code.

The script currently checks commits only if the committer is the weblate
user. What happens if I merge commits from weblate into some branch of mine,
then push them? Should those weblate changes be checked by the script, or
should they be considered as reviewed by the actual committer?

Looking at the code in detail, I have mostly nitpicks.

English tweaks:

-# Update git hook to ensure, that a user is only allowed to push modifications
-# to existiong files. Only works with gitolite ( as this sets the env variable GL_USER
+# Update git hook to ensure that a user is only allowed to push modifications
+# to existing files. Only works with gitolite, as this sets the env variable GL_USER
 # that is connected to the ssh key used for pushing.

-# It is used to ensure, that Weblate is only allowed to push po files.
+# It is used to ensure that Weblate is only allowed to push po files.

I would use variable names that are more representative of how they are used:

 # Gitolite sets env variable for the user for the SSH user
 GL_USER = "weblate" 
+# EXPECTED_GL_USER

 # Only allowed commiter mail
 COMMITTER = "weblate <tails-l10n@boum.org>" 
+# EXPECTED_COMMITTER

This regexp missed the r-string marker:

 # Regex to match for modified po files in raw output format of git diff-tree
 # in-place edit  :100644 100644 bcd1234... 0123456... M file0
-reModifyFile = re.compile("^[^\t]* M\t.*\.po$")
+reModifyFile = re.compile(r"^[^\t]* M\t.*\.po$")

I like regexps to be anchored if they need to match at the beginning of the
line, even though they are run with .match(). This way if someone
accidentally changes match with a search, the intent of the regexp doesn't
change. I'm also afraid of using .+ except when really needed, and I would
opt for being super-strict instead:

 # Regex to match the commiter line of git cat-file.
 # E.g "committer Sandro Knauß <hefee@debian.org> 1542020415 +0100" 
-reCommitter = re.compile("committer (?P<committer>(?P<name>.*) (<(?P<mail>.*)>)) .*")
+reCommitter = re.compile(r"^committer [^<]+<(?P<email>[^>]+)> \d+ [-+][0-9]+$")

In the committer function, I notice that only the committer string is ever
used, and it's only ever checked against COMMITTER. At this point, one could
just switch to matching the whole line with a regexp like this one:

re_committer = re.compile(r"^committer weblate <tails-l10n@boum.org> \d+ [-+][0-9]+$")

Note that in the committer function you also want to stop parsing at the
first empty line, or you might match a phony committer line inside the commit
message. Alternatively, just let git do the work and use
git show --format=….

Note that I managed to create commit objects with pretty much any string I
wanted in them (git hash-object -t commit -w <filename> gives you a commit
that you can push, and it contains whatever is in that file), so depending on
how untrustful you want to be of weblate pushes, you can decide not to have git
do any parsing and strictly validate any line of the commit file that comes
before the empty lines that marks the begin of the commit message.

Random python tidbits:

Where you do: _ = subprocess.check_output, normally the _ variable in
python is used to mean "this value is ignored", and I would use a different
variable name.

If you always decode the result of subprocess.check_output, you may want to
pass universal_newlines=True to check_output to get unicode strings
instead.

As a general practice in python scripts, I like to put my main code inside a
function, and then call it with:

def main():
    ref = sys.argv[1]
    oldrev = sys.argv[2]
    newrev = sys.argv[3]
    …

if __name__ == "__main__":
    main()

This makes the variables used by main scoped at a function level instead of
module level (I spent some unpleasant time debugging issued caused by this in
the past).

Also, calling the function inside if __name__ == "__main__": makes the script
importable as a module, and therefore unit-testable. Would it be worth having
unit tests for this?

In the main if os.environ.get("GL_USER")…:, I would explicitly have
an else: branch even if it does nothing: in security sensitive code, I'd like
to make any code that does nothing explicit, like this:

if os.environ.get("GL_USER") == GL_USER:
    …
else:
    # We do not interfere with commits from other users
    pass

#27 Updated by enrico 2 months ago

[treppenwitz] Another design question: how much should we not trust weblate pushes?

In particular, should we check things like these?

  • the .po files have the right format
  • the .po files are in the correct paths
  • the .po file name matches a language code
  • the .po files have the right permissions

#28 Updated by u 2 months ago

  • QA Check changed from Ready for QA to Dev Needed

#29 Updated by hefee 2 months ago

  • Assignee changed from hefee to enrico
  • QA Check changed from Dev Needed to Info Needed

Thanks a lot for your review. I have no further comments on them atm and fill fix them soon after the further input, also started to write unit-tests for the script.

enrico wrote:

Alternatively, could gitpython be used instead of running git with subprocess?

Then we wouldn't need to bother with writing parsing code.

@intrigeri : I would highly like to use gitpython for this code. Is it possible to install python3-git on the gitolite server? What python3 version do we have on gitolite server?

enrico wrote:

[treppenwitz] Another design question: how much should we not trust weblate pushes?

@enrico: What scenarios are you thinking of?

In particular, should we check things like these?
  • the .po files have the right format

this stuff should be checker later on by Jenkins and unify_po (#15403).

  • the .po files are in the correct paths
  • the .po file name matches a language code
  • the .po files have the right permissions

that makes sense it easy to check.

#30 Updated by intrigeri 2 months ago

intrigeri : I would highly like to use gitpython for this code. Is it possible to install python3-git on the gitolite server?

@hefee, sure thing. We'll do that when we deploy this hook (currently we have no suitable place in our Puppet code to encode a dependency of a script we don't deploy yet; no change we forget it since the hook won't work without its dependencies, so no need to do additional paperwork at this point).

What python3 version do we have on gitolite server?

We're running Stretch there ⇒ 3.5.3.

enrico wrote:

[treppenwitz] Another design question: how much should we not trust weblate pushes?

@enrico: What scenarios are you thinking of?

In particular, should we check things like these?
  • the .po files have the right format

this stuff should be checker later on by Jenkins and unify_po (#15403).

  • the .po files are in the correct paths
  • the .po file name matches a language code
  • the .po files have the right permissions

that makes sense it easy to check.

ACK, modulo perhaps the permissions bits: I don't see what we can do about that (except perhaps checking the executable bit?) in a pre-receive hook.

#31 Updated by enrico 2 months ago

  • Assignee changed from enrico to hefee

hefee wrote:

enrico wrote:

[treppenwitz] Another design question: how much should we not trust weblate pushes?

@enrico: What scenarios are you thinking of?

The description of this issue says: "the translation-server may be compromised and the weblate user may pushes malicious commits", and turning my level of paranoia waay up, I was wondering about what a malicious commit could do that is limited to files with .po extension, and I thought:

  • add malicious data in potfiles to trigger potential buffer overflows in tools that process them
  • put .po files in directories where they would be read as something else, like, say, configuration files that are read regardless of their extensions

Also, I don't know what kind of damage one could do down the toolchain by putting garbage in the git commit object. I can't come up with a reasonable attack scenario here, and unless someone can, feel free to disregard this potential attack vector.

#32 Updated by intrigeri 2 months ago

  • add malicious data in potfiles to trigger potential buffer overflows in tools that process them

Interesting!

FWIW, back when I wrote the ikiwiki PO plugin, it was meant to be exposed to untrusted data (iki*wiki*) so I fuzzed it and its most risky dependencies… which resulted in one CVE being filed somewhere down the dependency tree (my first one, happy memories :) — but of course we can't take for granted that I found all such potential issues and anyway, that was 10 years ago, so new issues can have been introduced ⇒ your idea is definitely worth considering IMO.

Now, validating the contents of PO/POT files here would move some of the attack surface from one place to the other; I don't know how the respective risk×impact of "PO/POT file validation on our Git server has security issues" vs. "PO/POT file handling by ikiwiki on our web server has security issues" compare. Given our Git server controls the contents of our website and the rest of our infra (Puppet), the impact of exploitation seems to be higher (at first glance) in the first case. But the risk may very well be lower.

#33 Updated by CyrilBrulebois 2 months ago

  • Target version changed from Tails_3.13 to Tails_3.14

#34 Updated by hefee about 2 months ago

  • Assignee changed from hefee to enrico
  • QA Check changed from Info Needed to Ready for QA

Thaks for you detailed review. Now the second round of review.

enrico wrote:

Alternatively, could gitpython be used instead of running git with subprocess?
Then we wouldn't need to bother with writing parsing code.

good point use now gitpython.

The script currently checks commits only if the committer is the weblate
user. What happens if I merge commits from weblate into some branch of mine,
then push them? Should those weblate changes be checked by the script, or
should they be considered as reviewed by the actual committer?

yes if a core dev pushes changes from weblate user and the checks should not take place, in such a scenario. A core dev is responsible for all their commits they push.

Looking at the code in detail, I have mostly nitpicks.

done.

I like regexps to be anchored if they need to match at the beginning of the
line, even though they are run with .match(). This way if someone
accidentally changes match with a search, the intent of the regexp doesn't
change. I'm also afraid of using .+ except when really needed, and I would
opt for being super-strict instead:

done.

In the committer function, I notice that only the committer string is ever
used, and it's only ever checked against COMMITTER. At this point, one could
just switch to matching the whole line with a regexp like this one:

not needed anymore, as we use gitpython.

Note that in the committer function you also want to stop parsing at the
first empty line, or you might match a phony committer line inside the commit
message. Alternatively, just let git do the work and use
git show --format=….

not needed anymore, as we use gitpython.

Note that I managed to create commit objects with pretty much any string I
wanted in them (git hash-object -t commit -w <filename> gives you a commit
that you can push, and it contains whatever is in that file), so depending on
how untrustful you want to be of weblate pushes, you can decide not to have git
do any parsing and strictly validate any line of the commit file that comes
before the empty lines that marks the begin of the commit message.

should be fixed with using gitpython.

Random python tidbits:

Where you do: _ = subprocess.check_output, normally the _ variable in
python is used to mean "this value is ignored", and I would use a different
variable name.

ACK.

If you always decode the result of subprocess.check_output, you may want to
pass universal_newlines=True to check_output to get unicode strings
instead.

ACK..

As a general practice in python scripts, I like to put my main code inside a
function, and then call it with:

On the other side, it makes it harder to run interactive python shell, or do you know a way, how I can make the python interactive shell stay in a functions scope?

This makes the variables used by main scoped at a function level instead of
module level (I spent some unpleasant time debugging issued caused by this in
the past).

Me, too ;)

Also, calling the function inside if __name__ == "__main__": makes the script
importable as a module, and therefore unit-testable. Would it be worth having
unit tests for this?

Created tests for main function. But not for the check_commit_author_weblate, as I don't have a good idea, how a good test environment should look like. As I think the tests should use a real git repository. But downloading every time the tails.git would make the tests very slow and network intensive. Any ideas?

@intrigeri: it would make sense to run the tests on CI, but therefore the file needs to have a .py suffix instead of .hook. But than the files on the server are named differently than in puppet. Aand shipping the tests inside the file, makes the file hard to read, that's why better create a sub directory with those tests.

In the main if os.environ.get("GL_USER")…:, I would explicitly have
an else: branch even if it does nothing: in security sensitive code, I'd like
to make any code that does nothing explicit, like this:

ACK

additional checks:

the .po files are in the correct paths

done.

the .po files have the right permissions

done.

the .po file name matches a language code

As source of allowed language codes I use the langs.json, that is created out of templates/weblate/langs.json.erb. This limits the allowed languages for those, that are handled by Weblate at all. Do you have any ideas, how to make ADDITIONAL_LANGUAGES lazy loading, because currently the json file is parsed for every commit also if we do not need those languages at all.

#35 Updated by intrigeri about 1 month ago

intrigeri: it would make sense to run the tests on CI, but therefore the file needs to have a .py suffix instead of .hook. But than the files on the server are named differently than in puppet. Aand shipping the tests inside the file, makes the file hard to read, that's why better create a sub directory with those tests.

@hefee: if you need input from me on this topic before our June sprint, please sum up the context and clarify what the question is :)

#36 Updated by enrico about 1 month ago

yes if a core dev pushes changes from weblate user and the checks should not
take place, in such a scenario. A core dev is responsible for all their
commits they push.

Ack.

On the other side, it makes it harder to run interactive python shell, or do
you know a way, how I can make the python interactive shell stay in a
functions scope?

I think I don't understand the question: can you give me an example of what you would do with the shell?

Created tests for main function. But not for the check_commit_author_weblate,
as I don't have a good idea, how a good test environment should look like. As
I think the tests should use a real git repository. But downloading every
time the tails.git would make the tests very slow and network intensive. Any
ideas?

Given that check_commit_author_weblate takes a repo and a revision range, I'd create mock git repos in a tempdir using gitpython, rather than work with the real ones. It would also make it possible to hand-craft all sorts of malicious one-off repos for testing.

As source of allowed language codes I use the langs.json, that is created out of templates/weblate/langs.json.erb. This limits the allowed languages for those, that are handled by Weblate at all. Do you have any ideas, how to make ADDITIONAL_LANGUAGES lazy loading, because currently the json file is parsed for every commit also if we do not need those languages at all.

If parsing langs.json is not slowing things down significantly, I'd just leave it as it is, as I expect that there aren't going to be so many pushes that it would create trouble.

Otherwise, this could be a lazy loading system:

class AllowedLanguages:
    def __init__(self, pathname):
        self.pathname = pathname
        self.result = None

    def load(self):
        # Current getAllowedLanguages implementation, assigning to self.result

    def __iter__(self):
        if self.result is None:
            self.load()
        return self.result

ALLOWED_LANGUAGES = AllowedLanguages(pathlib.Path(os.path.realpath(__file__)))

With regards to a review, I don't see any potential security issues in the code, besides the possibility that there are checks we have not thought of or that are not part of the current requirements, like validating the contents of .po files, or making sure .po files are only committed to a limited set of directories.

For example, an attack scenario could be adding a .po file that contains malicious commands into a config directory read by something that does something like for file in myconfig.rc/*; do source $file; done

Some nitpicks:

getAllowedLanguages and reModifyFile should probably be get_allowed_languages and re_modify_file (see
https://www.python.org/dev/peps/pep-0008/#function-and-variable-names )

In check_commit_author_weblate, I'd keep iterating all commits appending errors to a list, and return the list of error messages: that way one can see all the issues in a push and not just the first one. In main(), if the result is the empty string we are fine, oherwise one can print the list to stderr and exit with an error code.

The check_commit_author_weblate name does not seem to match its function anymore: I propose something like validate_commit_range instead.

I like the idea of moving the test suite to a different file and location in the git repo. I'm attaching an example of the split test suite, which can load the hook from an arbitrary path.

#37 Updated by u about 1 month ago

The script currently checks commits only if the committer is the weblate
user. What happens if I merge commits from weblate into some branch of mine,
then push them? Should those weblate changes be checked by the script, or
should they be considered as reviewed by the actual committer?

yes if a core dev pushes changes from weblate user and the checks should not take place, in such a scenario. A core dev is responsible for all their commits they push.

That's the theory. We have more and more people who will be allowed push rights, and I believe at some point there might be human failure. That's perfectly natural. Once there'll be a lot of translations coming from Weblate, i.e. when it's in production, we cannot expect everyone to read every po file change in my opinion. So: How much work would it be to implement a check that would safeguard us against this and automate this task instead of relying on people?

#38 Updated by hefee 17 days ago

  • Assignee changed from hefee to enrico

updated my branch on salsa.

enrico wrote:

On the other side, it makes it harder to run interactive python shell, or do
you know a way, how I can make the python interactive shell stay in a
functions scope?

I think I don't understand the question: can you give me an example of what you would do with the shell?

This is a more gerneral question about Python. If I debugging a function/class how do I manage to be in the method/class context, so I can look at all local variables interactivly. In meanwhile I found out, that I can use ipython and breakpoints to do such stuff. But this topic is out of scope.

Created tests for main function. But not for the check_commit_author_weblate,
as I don't have a good idea, how a good test environment should look like. As
I think the tests should use a real git repository. But downloading every
time the tails.git would make the tests very slow and network intensive. Any
ideas?

Given that check_commit_author_weblate takes a repo and a revision range, I'd create mock git repos in a tempdir using gitpython, rather than work with the real ones. It would also make it possible to hand-craft all sorts of malicious one-off repos for testing.

Done.

As source of allowed language codes I use the langs.json, that is created out of templates/weblate/langs.json.erb. This limits the allowed languages for those, that are handled by Weblate at all. Do you have any ideas, how to make ADDITIONAL_LANGUAGES lazy loading, because currently the json file is parsed for every commit also if we do not need those languages at all.

If parsing langs.json is not slowing things down significantly, I'd just leave it as it is, as I expect that there aren't going to be so many pushes that it would create trouble.

Also for testing purpose I need to disable the langs.json loading, as this file is not available at expected location.

With regards to a review, I don't see any potential security issues in the code, besides the possibility that there are checks we have not thought of or that are not part of the current requirements, like validating the contents of .po files, or making sure .po files are only committed to a limited set of directories.

We have the limitation of directories (see ALLOWED_DIRECTORIES). Or are you referring to something different. We desided that the content of po files will be checked by Jenkins.

For example, an attack scenario could be adding a .po file that contains malicious commands into a config directory read by something that does something like for file in myconfig.rc/*; do source $file; done

@intrigeri - I see, maybe we need to move all non wiki files out of wiki/src, so that this folder only contains data and no configs.

Some nitpicks:

getAllowedLanguages and reModifyFile should probably be get_allowed_languages and re_modify_file (see
https://www.python.org/dev/peps/pep-0008/#function-and-variable-names )

Me personally do not like the "_" variable names.

In check_commit_author_weblate, I'd keep iterating all commits appending errors to a list, and return the list of error messages: that way one can see all the issues in a push and not just the first one. In main(), if the result is the empty string we are fine, oherwise one can print the list to stderr and exit with an error code.

done

The check_commit_author_weblate name does not seem to match its function anymore: I propose something like validate_commit_range instead.

done

I like the idea of moving the test suite to a different file and location in the git repo. I'm attaching an example of the split test suite, which can load the hook from an arbitrary path.

done

#39 Updated by hefee 17 days ago

u wrote:

The script currently checks commits only if the committer is the weblate
user. What happens if I merge commits from weblate into some branch of mine,
then push them? Should those weblate changes be checked by the script, or
should they be considered as reviewed by the actual committer?

yes if a core dev pushes changes from weblate user and the checks should not take place, in such a scenario. A core dev is responsible for all their commits they push.

That's the theory. We have more and more people who will be allowed push rights, and I believe at some point there might be human failure. That's perfectly natural. Once there'll be a lot of translations coming from Weblate, i.e. when it's in production, we cannot expect everyone to read every po file change in my opinion.

@u I do not fully get your question. This subtopic is about, when a core dev merges weblate git repo into main git by hand and pushes this to main git. The only usecase, why a core dev would do this, is because there are some broken commits on the weblate git and they want to overrule this hook. And this overrule would be visible, as they add a merge commit with the core dev as author.

The hook is about to perform on the automatic pushes from Weblate. So those commits that are accepted by this hook will end up in the main git. So nobody needs to "accept" them by hand.

So: How much work would it be to implement a check that would safeguard us against this and automate this task instead of relying on people?

What do you want to safeguard?

#40 Updated by intrigeri 17 days ago

intrigeri - I see, maybe we need to move all non wiki files out of wiki/src, so that this folder only contains data and no configs.

What files do you have in mind?

#41 Updated by hefee 15 days ago

intrigeri wrote:

intrigeri - I see, maybe we need to move all non wiki files out of wiki/src, so that this folder only contains data and no configs.

What files do you have in mind?

@intrigeri: this is what I have in mind:

find wiki -perm -u+rwx -type f | grep -v ".po$" 
wiki/src/lib/js/jquery.min.js
wiki/src/contribute/l10n_tricks/doc-whatchanged
wiki/src/contribute/l10n_tricks/transifex_translators.sh
wiki/src/contribute/l10n_tricks/language_statistics.sh
wiki/src/contribute/l10n_tricks/git-clean-po
wiki/src/contribute/l10n_tricks/unifyPo.py
wiki/src/contribute/how/documentation/compress-image.sh
wiki/src/contribute/how/documentation/qrcode-decode.sh
wiki/src/contribute/how/documentation/qrcode-encode.sh
wiki/src/blueprint/greeter_revamp_UI/mockups/mockup.py
wiki/src/blueprint/mumble-server

I think it makes sense to move all those scripts outside wiki/src. IMO it makes it harder for the weblate user to commit stuff, that will may be executed.

Btw. there are some po files with executable flag, we should update this:

find wiki -perm -u+rwx -type f | grep "\.po$"             
wiki/src/doc/first_steps/upgrade/release_notes.es.po
wiki/src/doc/first_steps/startup_options/mac_spoofing.es.po
wiki/src/doc/first_steps/startup_options/bridge_mode.inline.es.po
wiki/src/security/Numerous_security_holes_in_3.2.es.po
wiki/src/security/Numerous_security_holes_in_3.4.es.po
wiki/src/security/Numerous_security_holes_in_3.0.1.es.po
wiki/src/security/Numerous_security_holes_in_3.1.es.po
wiki/src/security/Numerous_security_holes_in_3.3.es.po
wiki/src/security/Numerous_security_holes_in_3.0.es.po
wiki/src/news/many_hands_make_tails.es.po
wiki/src/news/what_we_accomplished_in_2017.es.po
wiki/src/news/our_plans_for_2018.es.po
wiki/src/news/additional_software_ux_design.es.po
wiki/src/news/veracrypt_ux_design.es.po
wiki/src/news/rescue_3.0.1.es.po
wiki/src/news/reproducible_Tails.es.po
wiki/src/install/clone.es.po
wiki/src/install/inc/steps/restart_first_time.inline.es.po
wiki/src/install/win/usb-download.es.po
wiki/src/install/win/usb-overview.es.po
wiki/src/install/win/usb.es.po
wiki/src/install/mac/clone.es.po
wiki/src/install/mac/dvd-overview.es.po
wiki/src/install/mac/usb.es.po
wiki/src/install/mac/dvd.es.po
wiki/src/install/dvd-download.es.po
wiki/src/install/vm-download.es.po
wiki/src/install/linux/clone-overview.es.po
wiki/src/install/linux/usb.es.po
wiki/src/install/debian/clone-overview.es.po
wiki/src/upgrade/tails-overview.es.po
wiki/src/upgrade/tails.es.po
wiki/src/upgrade/tails-download.es.po
wiki/src/donate.es.po

#42 Updated by intrigeri 11 days ago

intrigeri: this is what I have in mind:

> find wiki -perm -u+rwx -type f | grep -v ".po$" 
> wiki/src/lib/js/jquery.min.js
> wiki/src/contribute/l10n_tricks/doc-whatchanged
> wiki/src/contribute/l10n_tricks/transifex_translators.sh
> wiki/src/contribute/l10n_tricks/language_statistics.sh
> wiki/src/contribute/l10n_tricks/git-clean-po
> wiki/src/contribute/l10n_tricks/unifyPo.py
> wiki/src/contribute/how/documentation/compress-image.sh
> wiki/src/contribute/how/documentation/qrcode-decode.sh
> wiki/src/contribute/how/documentation/qrcode-encode.sh
> wiki/src/blueprint/greeter_revamp_UI/mockups/mockup.py
> wiki/src/blueprint/mumble-server
> 

Ah, OK! These files are there as "attachments" to website pages. We can certainly move them elsewhere (except JQuery, which is used by our website and has to stay there). We'll need to update the links so they point to Git with [[!tails_gitweb path/to/file/in/git]].

I think it makes sense to move all those scripts outside wiki/src. IMO it makes it harder for the weblate user to commit stuff, that will may be executed.

I thought the Weblate user was supposed to be allowed to change only .po files?

If it's allowed to modify arbitrary files, then moving these files out of wiki/src won't change the security big picture much: modifying .mwdn files is basically as problematic, from a security standpoint, as modifying a random .sh attached file.

Btw. there are some po files with executable flag, we should update this:

Merge request? :)

#43 Updated by enrico 11 days ago

intrigeri wrote:

I thought the Weblate user was supposed to be allowed to change only .po files?

If it's allowed to modify arbitrary files, then moving these files out of wiki/src won't change the security big picture much: modifying .mwdn files is basically as problematic, from a security standpoint, as modifying a random .sh attached file.

It is only allowed to change .po files, but it does not check in which location they are (see https://salsa.debian.org/hefee/puppet-tails/blob/bugfix/15402-update-hook-for-weblate-pushes/files/gitolite/hooks/tails-weblate-update.hook#L54 ), so it can potentially push a .po file anywhere in the repo.

This could be an issue if at some point in the repo there is a directory, of, say, shellscripts or configuration files that are sourced by some code regardless of their extension.

This kind of check was not technically part of the initial set of features, though.

#44 Updated by u 11 days ago

hefee wrote:

u wrote:

The script currently checks commits only if the committer is the weblate
user. What happens if I merge commits from weblate into some branch of mine,
then push them? Should those weblate changes be checked by the script, or
should they be considered as reviewed by the actual committer?

yes if a core dev pushes changes from weblate user and the checks should not take place, in such a scenario. A core dev is responsible for all their commits they push.

That's the theory. We have more and more people who will be allowed push rights, and I believe at some point there might be human failure. That's perfectly natural. Once there'll be a lot of translations coming from Weblate, i.e. when it's in production, we cannot expect everyone to read every po file change in my opinion.

@u I do not fully get your question. This subtopic is about, when a core dev merges weblate git repo into main git by hand and pushes this to main git. The only usecase, why a core dev would do this, is because there are some broken commits on the weblate git and they want to overrule this hook. And this overrule would be visible, as they add a merge commit with the core dev as author.

Exactly. And then, do we expect the core dev to go though every PO file (there might be many) or can we extend the check so that it would also cover such a merge commit?

The hook is about to perform on the automatic pushes from Weblate. So those commits that are accepted by this hook will end up in the main git. So nobody needs to "accept" them by hand.

Except for the above case, right?

#45 Updated by enrico 5 days ago

  • Assignee changed from enrico to hefee

hefee wrote:

For example, an attack scenario could be adding a .po file that contains malicious commands into a config directory read by something that does something like for file in myconfig.rc/*; do source $file; done

@intrigeri - I see, maybe we need to move all non wiki files out of wiki/src, so that this folder only contains data and no configs.

I'm leaving this discussion to the people who are in charge of the design; I see no reason to block in a review given the current specification, and I'd leave this discussion to the specification of a following release, so at least something useful can get deployed in the meantime.

getAllowedLanguages and reModifyFile should probably be get_allowed_languages and re_modify_file (see
https://www.python.org/dev/peps/pep-0008/#function-and-variable-names )

Me personally do not like the "_" variable names.

I respect your preference, but if other people are going to have to maintain your Python code, they will be used to, and expect, the coding conventions of the Python ecosystem, and will find it harder to read your code if you do not follow them.

Unless you are writing code that is intended to be forever maintained only by you and you alone, which I assume is not the case here, I consider not following established naming conventions an issue :)

Everything else looks good to me!

#46 Updated by CyrilBrulebois 2 days ago

  • Target version changed from Tails_3.14 to Tails_3.15

#47 Updated by hefee 1 day ago

  • Assignee deleted (hefee)
  • Target version deleted (Tails_3.15)
  • QA Check changed from Ready for QA to Pass

#48 Updated by hefee 1 day ago

enrico wrote:

intrigeri wrote:

I thought the Weblate user was supposed to be allowed to change only .po files?

If it's allowed to modify arbitrary files, then moving these files out of wiki/src won't change the security big picture much: modifying .mwdn files is basically as problematic, from a security standpoint, as modifying a random .sh attached file.

It is only allowed to change .po files, but it does not check in which location they are (see https://salsa.debian.org/hefee/puppet-tails/blob/bugfix/15402-update-hook-for-weblate-pushes/files/gitolite/hooks/tails-weblate-update.hook#L54 ), so it can potentially push a .po file anywhere in the repo.

@enrico: No weblate can only push to wiki/src/:
https://salsa.debian.org/hefee/puppet-tails/blob/bugfix/15402-update-hook-for-weblate-pushes/files/gitolite/hooks/tails-weblate-update.hook#L74

#49 Updated by hefee 1 day ago

  • Related to Bug #16760: remove executable bit for po files added

#50 Updated by hefee 1 day ago

u wrote:

I do not fully get your question. This subtopic is about, when a core dev merges weblate git repo into main git by hand and pushes this to main git. The only usecase, why a core dev would do this, is because there are some broken commits on the weblate git and they want to overrule this hook. And this overrule would be visible, as they add a merge commit with the core dev as author.

Exactly. And then, do we expect the core dev to go though every PO file (there might be many) or can we extend the check so that it would also cover such a merge commit?

@u: Well yes core devs need to go though every file they push like for every other push they do. As core devs can push everything aka non po changes, scripts etc., can fake author/committer. And we need a way to overrule the automatic system, otherwise a bug in this system would have the possibility to block Tails completely.

The hook is about to perform on the automatic pushes from Weblate. So those commits that are accepted by this hook will end up in the main git. So nobody needs to "accept" them by hand.

Except for the above case, right?

That described above case is no automatic push triggered by Weblate. It is not daytoday case it is an "oh the automatic system doesn't worked and we need to fix it really fast". The proper fix would fix the automatic system, so the changes can flow into main git automatically again. But we may need to spin a new release with new translations...

Also available in: Atom PDF