Project

General

Profile

Feature #16326

Feature #10034: Translation web platform

Review the Weblate-based infrastructure that updates and pushes PO files

Added by intrigeri 8 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Infrastructure
Target version:
Start date:
01/08/2019
Due date:
% Done:

0%

Feature Branch:
Type of work:
Security Audit
Blueprint:
Starter:
Affected tool:
Translation Platform

Description

… so we are confident that PO changes pushed and checked via #15402 have gone through the expected process.


Related issues

Blocked by Tails - Bug #16855: Enforce reviewer policy on Weblate Resolved

History

#1 Updated by intrigeri 6 months ago

  • Target version set to Tails_3.15

#2 Updated by intrigeri 3 months ago

  • Parent task changed from #15082 to #15079

#3 Updated by intrigeri 3 months ago

  • Parent task changed from #15079 to #10034

#4 Updated by intrigeri 3 months ago

So this is about:

  • integrity of tails.git apart of website PO files, i.e. ensuring that the security checks performed on Git pushes by Weblate are good enough and effectively enforced
    • Enrico already did a security-oriented code review of the hook so I won't spend much time on it
    • I'll focus on the big picture and integration bits, where potential design/implementation flaws could allow bypassing the aforementioned security checks
  • availability of the infrastructure outside of the translation platform, that is impacted by the translation platform, that is in practice: can the translation platform DoS anything except itself? Attack vectors: validating arbitrary translated strings, compromising the Weblate webapp, compromising the system of a Tails sysadmin, compromising the system of an administrator of the translation platform (that can run any command as the weblate user).
  • integrity of our production website, which in our permissions model boils down to ensuring that translated strings pushed by Weblate have gone through the expected review process (it is the only mitigation we have in place against this risk)

Out of scope:

  • An adversary acting as several translators can push arbitrary changes to translated versions of our web pages. Impact can be, for example: instructing our users to take dangerous actions; hiding important security warnings; breaking page display or links in a way that makes the doc hard or impossible to follow. This risk comes directly from trade-offs we consciously made while designing our translation platform so we have to live with it.

#5 Updated by intrigeri 3 months ago

  • Status changed from Confirmed to In Progress

Hi,

@u and @hefee, there's a question for you at the end.
@hefee, see "recommendations" below wrt. actions I suggest we take as a result of this review.

intrigeri wrote:

  • integrity of tails.git apart of website PO files, i.e. ensuring that the security checks performed on Git pushes by Weblate are good enough and effectively enforced
    • Enrico already did a security-oriented code review of the hook so I won't spend much time on it

Done, looks OK. I've pushed some polishing commits (mostly typo fixing and rephrasing) + added comments that make it clear why we don't perform some checks in the hook. hefee, please review :)

  • I'll focus on the big picture and integration bits, where potential design/implementation flaws could allow bypassing the aforementioned security checks

LGTM as long as Gitolite implements correctly the access restriction we've configured.

  • availability of the infrastructure outside of the translation platform, that is impacted by the translation platform, that is in practice: can the translation platform DoS anything except itself? Attack vectors: validating arbitrary translated strings, compromising the Weblate webapp, compromising the system of a Tails sysadmin, compromising the system of an administrator of the translation platform (that can run any command as the weblate user).

In general, our sysadmin team does not try to protect against VMs we host DoS'ing each other, i.e. we accept the risk that the translation platform DoS'es any other VM hosted on lizard, just like we accept the risk that folks with access to our Jenkins can DoS lizard. Therefore, DoS threat scenarios that would be specific to someone able to run arbitrary code on our translation platform are out of scope here.

However, whether an attacker acting with translator credentials can perform such attacks, "thanks" to how our translation platform is integrated with the rest of our infrastructure, is relevant. I believe the attack surface is limited to the commits pushed from Weblate to the canonical tails.git repo, that trigger some Git hooks:

  • the aforementioned update hook: given the checks we perform, presumably a very large amount of commits, or a commit that modifies many files, can trigger vast amounts of CPU and memory consumption on our Git server, possibly exhausting memory, therefore resulting in a DoS
    • recommendations:
      • optimize the reModifyFile regexp to limit the need for backtracking; it is probably acceptable, when doing so, to make the regexp a tiny bit less strict, e.g. currently it requires page names to be at least 3 chars long (ABC.LL.po) and to not end with a dot, which is not needed AFAICT
      • if feasible, use iterators instead of loading rev_list and oldHead.diff(newHead) in memory all at once
  • hooks that push to various Git mirrors that we don't control (and are thus out of scope) except our Redmine, which will import the newly pushed Git history, which can itself consume large amounts of resources, in particular if said history contains keywords recognized by Redmine (e.g. refs: #NNNN); this attack vector is already available to anyone editing blueprints via the web interface of our website so the translation platform does not make it any worse.
  • a hook that refreshes our production website with the newly pushed content; here as well, I think we accept the risk that an attacker posing as a translator can feed arbitrary strings and commit messages to ikiwiki, and the resulting risks.
  • integrity of our production website, which in our permissions model boils down to ensuring that translated strings pushed by Weblate have gone through the expected review process (it is the only mitigation we have in place against this risk)

https://tails.boum.org/blueprint/translation_platform/ does not specify anything apart of "provide user roles (admin, reviewer, translator)", so @u, @hefee:

  • Where is the expected review process documented and implemented?
  • What's the (technical, social) process to gain elevated credentials not available to anyone who registers an account?

Once I am pointed in the right direction, I'll check what it currently takes an adversary (possibly posing as several individual translators) to suggest arbitrary translations and validate them so they land on our website.

#6 Updated by intrigeri 3 months ago

intrigeri wrote:

  • Where is the expected review process documented and implemented?
  • What's the (technical, social) process to gain elevated credentials not available to anyone who registers an account?

Once I am pointed in the right direction, I'll check what it currently takes an adversary (possibly posing as several individual translators) to suggest arbitrary translations and validate them so they land on our website.

The thread that starts in https://lists.autistici.org/message/20190627.121709.5f198b23.en.html is about basically the same questions.

#7 Updated by intrigeri 3 months ago

  • Blocked by Bug #16855: Enforce reviewer policy on Weblate added

#8 Updated by u 3 months ago

#9 Updated by intrigeri 3 months ago

intrigeri wrote:

  • recommendations:
    • optimize the reModifyFile regexp to limit the need for backtracking; it is probably acceptable, when doing so, to make the regexp a tiny bit less strict, e.g. currently it requires page names to be at least 3 chars long (ABC.LL.po) and to not end with a dot, which is not needed AFAICT

hefee suggested I help with this regexp so I've tried to understand what criteria it expresses. I see it was modified from (essentially) .*\.po$ to what we now have in f8a455c0c543c43113fb5023dd6432fda1a2b4ba, which is not exactly atomic and whose message reads "Integrate enricos review for tails-weblate-update.hook". This lack of explanations (or pointers to them) makes it hard for me to tell whether it's safe to remove some of the restrictions added in that commit while optimizing this regexp: I have no idea what the reasoning behind these restrictions was. So instead I've looked closer and it turns out that this regexp should not expose us to the most pathological cases of backtracking, so nevermind, feel free to forget this recommendation.

Still, I'd feel more comfortable if the reasons for these restrictions (e.g. "path does not start with a dot" and "need a non-dot char between .LL.po") were documented as comments in the code, so that whoever works on it again in the future is in a better position to make sensible decisions. I'm mostly expecting ways to distinguish between "can't hurt, better safe than sorry, but not strictly needed at the moment" and "truly critical right now". See what I mean?

Finally, I would like to add a comment about what kind of path the regexp is used on but I need a confirmation here: fname = diff.b_path, that we call the regexp on, is the full path of a modified file, right? That is, in normal cases it will start with wiki/src/, correct? (If all this is correct, then I was wrong earlier: thanks to this there's actually no "page names to be at least 3 chars long (ABC.LL.po)" restriction currently. I got confused initially, which is why I'd like to document this, at least for the sake of some future version of /me :)

#10 Updated by intrigeri 2 months ago

  • Priority changed from Normal to High
  • Target version changed from Tails_3.15 to Tails_3.16

(I doubt #16855 will be done, and my above requests addressed, in time for me to complete this by July 9. Bumping priority so this is on top of my list whenever it's time for me to resume this work.)

#11 Updated by hefee 2 months ago

  • optimize the reModifyFile regexp to limit the need for backtracking; it is probably acceptable, when doing so, to make the regexp a tiny bit less strict, e.g. currently it requires page names to be at least 3 chars long (ABC.LL.po) and to not end with a dot, which is not needed AFAICT

What do you think about gettting rid of the regex completely and make it more easy to understand:

if not fname.endswith(".po") or fname.split["/"][-1].startswith("."):
   issues.append("[POLICY] Weblate is only allowed to modify po files.")
if not any(fname.endswith(".{}.po".format(i) for i in ALLOWED_LANGUAGES):
   issues.append("[POLICY] Language not allowed to push.")

(any creates a generator, so if one matches, it stops right away, and we do not loose CPU cycles)

  • if feasible, use iterators instead of loading rev_list and oldHead.diff(newHead) in memory all at once

- rev_list is already a generator (that is the python term for iterators) - nothing to do here.

- oldHead.diff, there is no generator version available - PythonGit needs to ask git binary in background, I don't see what we can improve here. Keep in mind, that by default the diff objects only holds the path and metadata of the diff, not the complete diff itself. Only if you do oldHead.diff(newHead, create_path=True), than this object holds the concrete diff. That's why I think also for several thousands of files, it should be not become an issue that fast in terms of memory.

#12 Updated by intrigeri 2 months ago

What do you think about gettting rid of the regex completely and make it more easy to understand:

Well, similarly to what I've discussed above, I'm not comfortable removing checks without understanding why they were added in the first place, so I'm not in a position to comment in an informed manner here.

This being said, it's probably better to you're proposing and document in the commit message why the checks were added and why we don't actually need them than to do what I've requested (documenting the reason for the checks in a comment): in both cases, one needs to look into the review / MR history and describe exactly the same facts. So yeah, go ahead :)

  • if feasible, use iterators instead of loading rev_list and oldHead.diff(newHead) in memory all at once

- rev_list is already a generator (that is the python term for iterators) - nothing to do here.
- oldHead.diff, there is no generator version available

OK!

#13 Updated by intrigeri 2 months ago

This being said, it's probably better to you're proposing and document in the commit message why the checks were added and why we don't actually need them

I meant "it's probably better to do".

#14 Updated by hefee 2 months ago

  • Status changed from In Progress to Needs Validation
  • Feature Branch set to https://salsa.debian.org/hefee/puppet-tails/tree/hefee/16326-replace-regex

I replaced the reModifyFile with a version that holds our expectations and made the tests run on Python 3.5.

#15 Updated by intrigeri 2 months ago

I replaced the reModifyFile with a version that holds our expectations and made the tests run on Python 3.5.

Merged.

(I'm not closing this ticket as we still need the reviewer policy.)

#16 Updated by intrigeri 2 months ago

  • Feature Branch deleted (https://salsa.debian.org/hefee/puppet-tails/tree/hefee/16326-replace-regex)

#17 Updated by u 2 months ago

#18 Updated by intrigeri about 2 months ago

  • Status changed from Needs Validation to Resolved

Also available in: Atom PDF