Project

General

Profile

Bug #16774

Transifex translations: we should not update from the _completed branches

Added by emmapeel 4 months ago. Updated 7 days ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
Internationalization
Target version:
Start date:
Due date:
% Done:

0%

Feature Branch:
Type of work:
Code
Blueprint:
Starter:
No
Affected tool:

Description

ey there!

I think we should change the branches from where we pick up our translations coming from transifex.

The Tails resources being translated on transifex have usually two branches each on https://gitweb.torproject.org/translation.git/
For example, the resource translating WhisperBack has:

This translations are imported with the ./import_translations.sh

  1. THE PROBLEM

As reported in https://trac.torproject.org/projects/tor/ticket/26878 , the \*_completed branches are only updated when a resource is completed, so a problem arises when:

  • A translation is completed - the _completed branch is updated
  • There are new strings on the file - the _completed branch is NOT updated
  • Some of the translations are done, but not all - the _completed branch is NOT updated

While I welcome help to solve the tor ticket before, my suggestion for Tails is to pull from the normal branch, and use the _completed branch as a way of seeing the situation of the translation (how many times this file was completed? etc).

By not using the _completed branch, we will be having less outdated translations. Especially on the 'long tail' of languages.


Related issues

Blocks Tails - Feature #16095: Curate the list of languages in Tails Greeter Confirmed 11/04/2018

Associated revisions

Revision 0db48f8a (diff)
Added by intrigeri 17 days ago

import-translations: use tails-misc_release for tails.git's PO files (refs: #16774)

History

#1 Updated by emmapeel 4 months ago

  • Description updated (diff)

#2 Updated by intrigeri 3 months ago

@emmapeel, is there any difference between the branches you recommend we use and the _completed ones, wrt. inclusion of non-reviewed translations?

Rationale for my question: I'd rather not regress on the (already rather poor) quality of translations we pull from Transifex.

#3 Updated by emmapeel 3 months ago

regarding the reviews, as well as the updates when a resource gets small changes, I am recommending to follow the not-_Completed branches, because I think it is better regarding the update of the files:

if someone makes a correction on an incomplete file, it will be updated in the resource branch, but it will not be updated on the _completed branch until the whole file is translated.

but unfortunately i cannot tell the transifex client to download according to review percentage. what i can tell you is that reviewing a resource makes really small changes, and it happens usually after a resource is completed, so this problem is arising for languages with less updates.

#4 Updated by intrigeri 3 months ago

OK, I understand. I'm still interested in the answer to my question, so we can consciously factor it into the pros/cons balancing act.

#5 Updated by intrigeri about 1 month ago

intrigeri wrote:

@emmapeel, is there any difference between the branches you recommend we use and the _completed ones, wrt. inclusion of non-reviewed translations?

Ping? In other words, do the branches without _completed in their name include translations that have not been reviewed yet?

#6 Updated by intrigeri about 1 month ago

  • Subject changed from transifex translations: we should not update from the _completed branches to Transifex translations: we should not update from the _completed branches
  • Priority changed from Low to Normal

(This seems to be an important problem.)

#7 Updated by emmapeel about 1 month ago

As said previously, the difference is:

If someone makes a correction on a resource in which the translation is incomplete, it will be updated in resource branch, but not on the _completed branch until the whole file is translated.

#8 Updated by intrigeri about 1 month ago

As said previously […]

I understood this part. It is a clear advantage of your proposal.

But AFAICT you still did not answer my question so I still can't balance this advantage vs. potential drawbacks :/

#9 Updated by emmapeel about 1 month ago

ey,good news!
I went looking for the docs to show you that it was not possible, and it seems they finally went around this!

We can test putting the tails translations in transifex onto this mode:

https://docs.transifex.com/client/pull/#getting-different-file-variants

tx pull -a --mode reviewed --minimum-perc 100

#10 Updated by emmapeel about 1 month ago

I tested such command, and only the French translation was ready. But

tx pull -a --mode reviewed

produced

https://gitweb.torproject.org/translation.git/commit/?h=tails-misc&id=5e8bf218a91af9dd4319b0dc1a3e637a01ef405c

that looks alright i think.... shall i leave it like that on the update script?

and in all Tails branches?

#11 Updated by intrigeri about 1 month ago

  • Status changed from Confirmed to In Progress
  • Assignee set to emmapeel
  • Target version set to Tails_3.16

I tested such command, and only the French translation was ready. But

tx pull -a --mode reviewed

Ooh yeah, --mode reviewed sounds very nice to my ear :))) Thanks a lot for investigating!

shall i leave it like that on the update script? and in all Tails branches?

I think I'd like to only include translations above some minimal level; I agree that 100% is too high a bar but 0 is too low IMO. In doubt, I would use 25% to start with, in order to be consistent with https://tails.boum.org/contribute/how/translate/team/new/ (if that's good enough for the core pages of our website, that should be good enough for our custom programs as well). I assume this translates into --minimum-perc 25. Makes sense?

Once we agree on some number we can use for now, yeah, please do this on all Tails non-completed branches, then reassign to me: I'll take a last look and if happy (which I assume I will be), I will update our own scripts to fetch from non-completed branches as you suggested, and we can call this ticket done!

I suspect we'll want to fine-tune this later: @sajolida is working on stuff that will probably transitively depend on these settings (by reusing the list of PO files in tails.git:po/ for other stuff). But this can happen on another ticket.

#12 Updated by sajolida about 1 month ago

  • Blocks Feature #16095: Curate the list of languages in Tails Greeter added

#13 Updated by intrigeri about 1 month ago

emma & I agreed on having _release branches, that contain only reviewed strings and only languages that have 25% of the strings translated+reviewed. Once this is implemented on Tor's side, we'll adjust the Tails code to fetch from these new branches (and while we're at it, we should think about what we'll do wrt. PO files being removed from these branches: AFAICT our current code will simply leave the old version in place).

#14 Updated by intrigeri about 1 month ago

In current tails-misc (that only has reviewed strings now):

  • 20 languages at 25% or more: ca, cs, de, el, es_AR, es, fi, fr, ga, he, hu, it, km, lt, pt_BR, pt_PT, ro, sv, tr, zh_CN
  • 1 language between 0% and 25%: ar
  • Quite a few languages got dropped because the strings were never reviewed; there's a general lack of reviewers on Transifex. For example, vi.po has 58 strings translated in current tails.git, but was never updated since 2017 in Transifex, and never reviewed — who knows if these translations are good.

#15 Updated by emmapeel about 1 month ago

  • Description updated (diff)

#16 Updated by sajolida 28 days ago

Why not use a 0% threshold?

The 25% threshold on our website makes more sense because having languages enabled on our website has some cost: at least in build time and in work when unfuzzing stuff manually. But translated software wouldn't have such a cost.

Seeing the list of languages between 0% and 25% (only 'ar') it also sound like not much extra work in case RMs have to fiddle with these.

#17 Updated by intrigeri 27 days ago

Hi!

Why not use a 0% threshold?

I think I've been somewhat confused.

On #16095 you initially wrote:

"Having such a long list makes it harder to know which languages are actually well translated and for the user to know what's her best option is, without trial and errors.

I think we should filter this list to only display the languages that are reasonably well translated.

If we base ourselves on the PO files for our internal tools, we might be able to automatically generate a list of languages during the build. Making sure that our internal tools are well translated in a given language before listing it sounds like a good criteria too."

But later on that ticket, you switched from "our internal tools are well translated" to "have at least 1 string of custom software translated", and I'm afraid I failed to adjust my thinking accordingly here.

I'm totally fine with letting you decide, from a UX perspective, which threshold we should use: I don't think it makes much of a difference from an implementation perspective.

emmapeel and I have plans to discuss such matters today on XMPP, it would be nice if you could join us :)

#18 Updated by sajolida 23 days ago

I agree that my position is not super clear and confusing. For me the main goal of #16095 is to bring down this very long list of 284 languages to something easier to parse. To avoid having to curate the list manually and run into political debates on whether we should keep or remove Luxembourgish or Ligurian from the list, I'm happy that we found a automated criteria that brings it down to around 50.

And, from the analysis that we did on actual translations files #16095, being stricter than 0% at applying the criteria of "our internal tools are well translated" is not really helpful.

#19 Updated by intrigeri 23 days ago

we found a automated criteria that brings it down to around 50.

And, from the analysis that we did on actual translations files #16095, being stricter than 0% at applying the criteria of "our internal tools are well translated" is not really helpful.

Note: said analysis did not include the reviewed criterion, which is part of the current proposal here. So the total number may be closer to 20 than to 50. Below I'll assume that you're fine with that.

So, as said earlier, I'm fine with letting sajolida pick the threshold, so here's an updated proposal:

  • Have _release branches, that contain only reviewed strings and only languages that have at least 1 string translated+reviewed.
    • Wrt. implementation details, if this is not something we can easily ask tx pull to do, I guess that using a 1% threshold would be acceptable.
  • Once this is implemented on Tor's side, we'll adjust the Tails code to fetch PO files from these new branches.
  • We can deal with the PO files being removed from these branches either here pro-actively, or, worst case, on #16095.

@emmapeel, would this work for you? If not, please have a chat about it with sajolida in place/time of your liking. Feel free to invite me if you think I can add something to the discussion :)

#20 Updated by emmapeel 22 days ago

intrigeri wrote:

So, as said earlier, I'm fine with letting sajolida pick the threshold, so here's an updated proposal:

  • Have _release branches, that contain only reviewed strings and only languages that have at least 1 string translated+reviewed.

What about doing this on the already existing _completed branches? green computing! I Think I want to apply said threshold to all _completed branches so I fix https://trac.torproject.org/projects/tor/ticket/26878 as well.

  • Wrt. implementation details, if this is not something we can easily ask tx pull to do, I guess that using a 1% threshold would be acceptable.
  • Once this is implemented on Tor's side, we'll adjust the Tails code to fetch PO files from these new branches.
  • We can deal with the PO files being removed from these branches either here pro-actively, or, worst case, on #16095.

We can always find the files on the git history, and their strings are still part of the transifex translation memory.

#21 Updated by intrigeri 22 days ago

What about doing this on the already existing _completed branches? green computing! I Think I want to apply said threshold to all _completed branches so I fix https://trac.torproject.org/projects/tor/ticket/26878 as well.

On the one hand, I would find this definition of "completed" slightly confusing: having 1 translated+reviewed string does not match what I understand with "completed". I'm slightly worried that in N months or years, someone will see this as a bug ("completed branches have incomplete translations, where can I find really complete translations?") and if this gets "fixed", we may lose the branches we need.

On the other hand, how the branches we use are called makes little difference for Tails in practice (very few people get exposed to these names): we'll be fine as long as 1. there are branches with the content we want; 2. these branches are here to stay; 3. we're in the loop if someone wants to change the criteria for inclusion in these branches.

So yeah, if applying this criteria on the _completed branches has advantages for you compared to creating new ones, it's fine by me :)

#22 Updated by intrigeri 17 days ago

Have _release branches, that contain only reviewed strings and only languages that have at least 1 string translated+reviewed.

For 4.0~beta2 I've switched to importing from tails-misc_release, as discussed earlier on XMPP. I trust your automation to have imported only reviewed strings in there, which is good. But 60 of the PO files in there have no single string translated, which violates the aforementioned criterion (and FWIW, 30 have at least one string translated).

This is not a big problem on this ticket: having PO files with no translation whatsoever won't cause trouble. Still, I've removed them as sajolida is looking at the content of tails.git:po/ and drawing conclusions from the number of files in there.

But #16095 is expecting us to ensure here that all PO files in tails.git:po/ respect the aforementioned criterion.

So, shall I filter out, at import time, PO files that have no string translated? Or will you do this on your side?

#23 Updated by CyrilBrulebois 14 days ago

  • Target version changed from Tails_3.16 to Tails_3.17

#24 Updated by intrigeri 7 days ago

  • Target version changed from Tails_3.17 to Tails_4.0

Also available in: Atom PDF