Project

General

Profile

Feature #16095

Curate the list of languages in Tails Greeter

Added by sajolida 11 months ago. Updated 21 days ago.

Status:
Confirmed
Priority:
Normal
Assignee:
-
Category:
Internationalization
Target version:
-
Start date:
11/04/2018
Due date:
% Done:

0%

Feature Branch:
Type of work:
Code
Blueprint:
Starter:
Affected tool:
Greeter

Description

We currently have many languages or languages variants that are poorly or not translated at all (either in Debian and GNOME in general or for our internal tools). It's the case for example of "Chinese, mandarin" and probably many others.

Having such a long list makes it harder to know which languages are actually well translated and for the user to know what's her best option is, without trial and errors.

I think we should filter this list to only display the languages that are reasonably well translated.

Implementation-wise, the Greeter can do either:

  • A. Only propose languages that have a PO file in tails.git + tier-1 languages, period. This embeds a removal mechanism, by definition.
  • B. Propose languages that have a PO file in tails.git + tier-1 languages + any language that was proposed in our last release. And then, manually clean up "any language that was proposed in our last release" from time to time.
  • C. Same as B except we automate the cleaning process as intrigeri suggested in #16095#note-24

Option A is the cheapest and is probably good enough. If we notice trouble later (e.g. churn/flapping, i.e. languages appearing and disappearing every second release), we can come back to it and implement option B or C.

Then, we'll need to remove all language packs (e.g. Tor Browser's) that can't possibly be used (because they're not listed in the Greeter).

languages.ods (12.8 KB) sajolida, 08/28/2019 06:48 PM


Related issues

Related to Tails - Bug #16093: Remove untranslated Chinese languages from Tails Greeter Confirmed 11/04/2018
Related to Tails - Feature #14544: Spend software developer time on smallish UX improvements In Progress 08/31/2018
Related to Tails - Feature #15807: Define & apply clear criteria for including dictionaries, fonts and language packs Resolved 08/18/2018
Related to Tails - Feature #9956: Consider replacing the additional fonts we ship with Noto Duplicate 08/09/2015
Related to Tails - Feature #7036: Move custom software to main git repo Confirmed
Related to Tails - Feature #17002: Do stats on the languages in which people start Tails Resolved
Blocks Tails - Feature #16209: Core work: Foundations Team Confirmed
Blocks Tails - Feature #16688: Core work 2019Q3 → 2019Q4: User experience Confirmed
Blocked by Tails - Bug #16774: Transifex translations: we should not update from the _completed branches In Progress
Blocks Tails - Feature #16094: Have simplified and traditional Chinese in the list of languages in Tails Greeter Confirmed 11/04/2018

History

#1 Updated by sajolida 11 months ago

  • Related to Bug #16093: Remove untranslated Chinese languages from Tails Greeter added

#2 Updated by sajolida 11 months ago

  • Related to Feature #14544: Spend software developer time on smallish UX improvements added

#3 Updated by mercedes508 10 months ago

  • Status changed from New to Confirmed

#4 Updated by sajolida 6 months ago

  • Related to Feature #15807: Define & apply clear criteria for including dictionaries, fonts and language packs added

#5 Updated by intrigeri 6 months ago

We currently have many languages or languages variants that are poorly or not translated at all (either in Debian and GNOME in general or for our internal tools). It's the case for example of "Chinese, mandarin" and probably many others.

I'm fine with hiding options that would result in most of the desktop being in English, at least when there's a close enough alternative that's better translated (e.g. a close variant of the same language). When there's no such alternate option, I'm worried that removing the option implies the user will have no choice but using English, and I'm not sure if it's better or worse than, say, 90% English and 10% in their preferred language; I guess it's a matter of "at least some bits are translated and thus easier to understand i.e. better than nothing" vs. "context/languages switching has a cost and is evil".

If we base ourselves on the PO files for our internal tools, we might be able to automatically generate a list of languages during the build. Making sure that our internal tools are well translated in a given language before listing it sounds like a good criteria too.

I'd be curious to see what this heuristics yields and whether the resulting list indeed would be likely to help users pick an option that will work better for them.

#6 Updated by intrigeri 6 months ago

  • Description updated (diff)

#7 Updated by sajolida 6 months ago

Ok, so what's the next step here?

Is it to build a table with:

  • Language name, all the ones we have in Tails Greeter
  • Percentage of translation of: core GNOME? all packages included in Tails?
  • Percentage of translation of Tails software

So we can have concrete examples?

I would know how to compute the percentage of translations of Tails software but not the rest.

#8 Updated by intrigeri 6 months ago

#9 Updated by intrigeri 6 months ago

  • Type of work changed from Discuss to Research

Ok, so what's the next step here?
Is it to build a table with: […]

I would start with Tails software only: if this gives us good enough results, it's by far the cheapest to implement initially and to maintain on the long run.

#10 Updated by intrigeri 6 months ago

  • Related to Feature #9956: Consider replacing the additional fonts we ship with Noto added

#11 Updated by sajolida 6 months ago

  • Assignee set to sajolida
  • Target version set to Tails_3.15

Ok, next step is to build this list.

#12 Updated by sajolida 4 months ago

  • Blocks Feature #16688: Core work 2019Q3 → 2019Q4: User experience added

#13 Updated by sajolida 4 months ago

  • Target version changed from Tails_3.15 to Tails_3.16

#14 Updated by sajolida about 1 month ago

Here is the list. I used as PO files:

  • greeter.git:po/
  • liveusb-creator.git:po/
  • tails.git:po/
  • persistence-setup.git:po/
  • whisperback.git:po/

The script is in ux.git:greeter/percentage-translated.rb.

The result is:

de           98.6
he           98.4
es           98.4
ca           98.4
tr           98.4
ka           98.4
ru           98.4
it           98.4
da           98.4
ro           98.4
zh_CN        98.4
is           98.3
pt_PT        98.3
fr           98.3
es_AR        98.2
en_GB        98.0
pl           98.0
ga           98.0
sv           97.8
el           97.4
zh_TW        93.4
fa           85.4
pt_BR        85.4
bn           85.2
bn_BD        84.1
pt           75.4
hr_HR        75.0
id           71.5
cs           71.1
nb           70.8
ko           70.5
bg           70.4
fr_CA        70.2
lv           70.2
lt           69.4
vi           68.3
uk           64.3
nl           61.6
hu           61.3
fi           56.5
ar           54.1
sq           49.8
zh_HK        49.0
sk_SK        49.0
ms_MY        45.2
sk           44.0
sl_SI        43.7
az           42.9
km           42.4
hr           37.5
ja           36.6
nl_BE        31.7
eu           31.5
nn           30.7
si_LK        30.2
gl           28.1
th           24.1
et           23.9
sr           21.8
lb           16.7
en           11.4
pl_PL         9.0
ta            8.1
hy            8.1
cy            8.0
ru@petr1708   7.2
nds           4.3
sr@latin      4.1
bs            3.3
ast           3.3
pa            3.3
bn_IN         2.1
ms            0.1
zh            0.0

I'd be interested in seeing a list of all current locale listed in Tails Greeter and study a bit the difference. I briefly tried to understand how the list was built in the code of Tails Greeter but failed.

Is it listing all locales in /usr/share/locale/?

#15 Updated by sajolida about 1 month ago

  • Related to Feature #7036: Move custom software to main git repo added

#16 Updated by sajolida about 1 month ago

We currently have 74 languages with at least some translation of our custom programs.

The full list of locales in /usr/share/locale/, which I suspect is the base for the listing in Tails Greeter, has 268 locales.

By only listing in Tails Greeter the locales for which we have at least 1 string of custom software translated, we would reduce the list of options by a factor of 4. Pretty good already!

Once our custom software are all in the main git repo (#7036), I guess that it will become even easier to compute this list at build time. For the time being, we could limit ourselves to the list of languages that are already in tails.git:po/, if that's easier.

I compared this list with the list of languages that have at least 1 string of custom software translated. This time ignoring liveusb-creator.git because some translations are probably inherited from upstream and might not even be relevant in the context of Tails. The missing languages are the following (ie. the languages that are in either greeter.git, persistence-setup.git, or whisperback.git):

ar
en
eu
gl
hr
hu
hy
ja
nl
nl_BE
pl_PL
ru@petr1708
si_LK
sk
ta
th

From these:

  • hr, pl_PL, and sk have another locale from the same language but another territory in tails.git.
  • Only ja, hu, and ar are translated at more than 25% across all repositories.

So, if my simplification is worth it in terms of integration code, I propose to only list in Tails Greeter, the languages that have a PO file in tails.git:po/. To add a new language in Tails Greeter, people should start translating for this language in tails.git.

@intrigeri: What do you think?

#17 Updated by intrigeri about 1 month ago

I'd be interested in seeing a list of all current locale listed in Tails Greeter and study a bit the difference. I briefly tried to understand how the list was built in the code of Tails Greeter but failed. Is it listing all locales in /usr/share/locale/?

It's generated from /usr/share/i18n/SUPPORTED, which I guess is the list of locales supported by Debian.

#18 Updated by intrigeri about 1 month ago

So, if my simplification is worth it in terms of integration code,

It would be easier to code, indeed. In any case, I would not want us to spend time now on writing code that will be obsoleted by #7036.

I propose to only list in Tails Greeter, the languages that have a PO file in tails.git:po/. To add a new language in Tails Greeter, people should start translating for this language in tails.git.

Currently tails.git:po/ only has languages for which 100% of these strings are translated (but emma peel proposed to change this on #16774). Also, the RM often has to remove invalid PO files from there during the release process, e.g. in May I removed: ar, fa, hu, ja, kk, nl, sk. The risk of such removal happening and its problematic impact will only increase with #7036. That's surely another problem, which we should probably think about somewhere else (and possibly as part of #7036), but if we're talking about "for the time being" we must consider this problem as very much alive.

So in the current state of things, your proposal is equivalent to: "in order to offer a language in Tails Greeter, people must translate 100% of the tails-misc resource on Transifex without doing any mistake that would make the resulting PO file invalid". This seems a very high bar to me. For example, with this heuristic, we would have removed the Japanese language option from the Greeter since 3.13.2, even though the GNOME desktop/apps and Tor Browser are almost entirely translated, which feels like a bad move to me.

Sorry for being a killjoy here, but I prefer giving you this info ASAP instead of waiting until I make time to reply with more constructive ideas.

#19 Updated by sajolida about 1 month ago

It's generated from /usr/share/i18n/SUPPORTED, which I guess is the list of locales supported by Debian.

Thanks!

So that would be a list of 284 locales (instead of 268 as I thought
initially):

sed -r 's/^([^\.@ ]+).+/\1/' /usr/share/i18n/SUPPORTED | sort -u | wc -l

#20 Updated by sajolida about 1 month ago

Currently tails.git:po/ only has languages for which 100% of these strings are translated (but emma peel proposed to change this on #16774).

I see something different in tails.git. For example:

  • po/zh.po is translated at 0% and after a quick scan I don't think
    that it has ever seen a translation.
  • po/cy.po is translated at 22%, pt at 38%, es_AR at 99%, etc.

But it sounds like a good idea anyway to block this ticket on #16774 and
there seems to be a solution for #16744 already :)

Also, the RM often has to remove invalid PO files from there during the release process, e.g. in May I removed: ar, fa, hu, ja, kk, nl, sk. The risk of such removal happening and its problematic impact will only increase with #7036. That's surely another problem, which we should probably think about somewhere else (and possibly as part of #7036), but if we're talking about "for the time being" we must consider this problem as very much alive.

Do you mean that if a glitch is introduced in fr.po between 3.15 and
3.16, we won't have any French translation of everything in po/tails.pot
in 3.16 and later until the glitch is fixed?

If so, could we consider asking RMs to instead either:

  • Remove the msgstr with the glitch but keep the rest of fr.po
  • Roll back fr.po to its version prior to the glitch

#21 Updated by sajolida about 1 month ago

  • Blocked by Bug #16774: Transifex translations: we should not update from the _completed branches added

#22 Updated by intrigeri about 1 month ago

Currently tails.git:po/ only has languages for which 100% of these strings are translated (but emma peel proposed to change this on #16774).

I see something different in tails.git. For example:

  • po/zh.po is translated at 0% and after a quick scan I don't think that it has ever seen a translation.

That one does not come from Transifex. It's a leftover from some old stuff of ours ⇒ deleted!

  • po/cy.po is translated at 22%, pt at 38%, es_AR at 99%, etc.

I suspect these ones are all caused by a bug in the script Tor runs to import translations from Transifex into Git. On top of that, our own import-translations script is not deleting PO files that were removed from the Tor Git repo we copy them from. So there are two bugs that make the current behaviour not match what the intent is/was.

Also, the RM often has to remove invalid PO files from there during the release process, e.g. in May I removed: ar, fa, hu, ja, kk, nl, sk. The risk of such removal happening and its problematic impact will only increase with #7036. That's surely another problem, which we should probably think about somewhere else (and possibly as part of #7036), but if we're talking about "for the time being" we must consider this problem as very much alive.

Do you mean that if a glitch is introduced in fr.po between 3.15 and 3.16, we won't have any French translation of everything in po/tails.pot in 3.16 and later until the glitch is fixed?

Exactly.

If so, could we consider asking RMs to instead either:

  • Remove the msgstr with the glitch but keep the rest of fr.po

I'd rather not to.

  • Roll back fr.po to its version prior to the glitch

Sure, that's cheap ⇒ done with 256a340a6627c9d1d3009bd96300bc88aa7c129b.

#23 Updated by sajolida about 1 month ago

  • Roll back fr.po to its version prior to the glitch

Sure, that's cheap ⇒ done with 256a340a6627c9d1d3009bd96300bc88aa7c129b.

Cool!

So the next steps are:

  • Work on #16774, which includes agreeing on a translation threshold
    level for inclusion of languages.
  • Finish agreeing on the mechanism I'm proposing here: include in Tails
    Greeter only the locale that have a translation file in tails.git:po.

#24 Updated by intrigeri about 1 month ago

So the next steps are:

  • Work on #16774, which includes agreeing on a translation threshold level for inclusion of languages.

Done, left to be implemented (emmapeel for the Tor bits + myself for the Tails ones).

  • Finish agreeing on the mechanism I'm proposing here: include in Tails Greeter only the locale that have a translation file in tails.git:po.

I agree with the general idea but I think some refining is needed.

As data points, with the threshold we picked, in the current state of the tails-misc resource:

  • we would have: ca, cs, de, el, es_AR, es, fi, fr, ga, he, hu, it, km, lt, pt_BR, pt_PT, ro, sv, tr, zh_CN
  • we would lose: az, bg, bn_BD, bn, cy, da, en_GB, fa, fr_CA, hr_HR, id, is, ka, ko, lv, ms_MY, nb, nn, pl, pt, ru, sk_SK, sl_SI, sq, sr, uk, vi, zh_HK, zh_TW

This will change over time, as we merge custom software into tails.git.

This is related to the discussion we had on #15807, where we agreed about this list of tier-1 supported languages:

  • Arabic - AR
  • German - DE
  • English - EN
  • Spanish - ES
  • Farsi - FA
  • French - FR
  • Hindi
  • Indonesian
  • Italian - IT
  • Portuguese - PT-BR
  • Russian - RU
  • Turkish - TR
  • Simplified Chinese - zh-CN

It would not make sense to me to hide, from the Greeter, languages that are on this list, and for which we ship language packs etc. So even if we don't have (enough reviewed) translations for these languages, IMO we should always propose them in the Greeter. In the current state of our translations, it makes a difference for Arabic, Hindi, Indonesian, and Russian.

Finally, I'm worried about the UX churn aspect of "include in Tails Greeter only the locale that have a translation file in tails.git:po". The extreme (admittedly theoretical) example is: if my language is translated around 25% on Transifex, it could be proposed in Tails 4.0, not proposed in 4.1, proposed again in 4.2, etc. More generally, training users to choose their preferred language in the Greeter, and then dropping it, may feel like a UX regression to them: even if our custom software is not translated well enough anymore for us to ship the corresponding translations, it may be that GNOME and Tor Browser are well translated into that language. So what about this: we use the mechanism you propose (+ tier-1 supported languages) to add languages to the Greeter, but not to remove them immediately as soon as they drop below the threshold. Instead, we could remove non-tier-1 languages from the Greeter only once we've not had PO files in tails.git:po for them since 6-12 months. The mechanism we have to update these PO files could update whatever file will be used to generate the list of languages visible in the Greeter, and in there, maintain a "last seen" timestamp; then when we generate the list of languages visible in the Greeter, we ignore those for which the "last seen" timestamp is too old, unless it's a tier-1 supported language.

What do you think?

#25 Updated by sajolida 27 days ago

IMO we should always propose [tier-1 languages] in the Greeter.

+1

we use the mechanism you propose (+ tier-1 supported languages) to add languages to the Greeter

+1

Instead, we could remove non-tier-1 languages from the Greeter only once we've not had PO files in tails.git:po for them since 6-12 months.

Why not. I believe that my proposal to use a 0% threshold on
#16774#note-16 would achieve the same in practice.

#26 Updated by intrigeri 27 days ago

Instead, we could remove non-tier-1 languages from the Greeter only once we've not had PO files in tails.git:po for them since 6-12 months.

Why not. I believe that my proposal to use a 0% threshold on #16774#note-16 would achieve the same in practice.

I understand this reasoning relies on the implicit assumption that with:

  • X = the probability that a given language's percentage of reviewed strings varies between zero and a strictly positive number
  • Y = the probability that a given language's percentage of reviewed strings varies around 25% (sometimes strictly less, sometimes more)

… X << Y, and X is sufficiently small that we can live with the risk of problematic UX consequences.

Intuitively, this sounds reasonable to me: I guess there's little chances that translators translate & review precisely the strings that are going to be obsoleted soon after, and only these ones, hence dropping to 0%, getting removed, then they translate & review a couple strings again and their language comes back.

To sum up, wrt. removing languages from the Greeter, we have three options:

  • A. Only propose languages that have a PO file in tails.git + tier-1 languages, period. This embeds a removal mechanism, by definition.
  • B. Propose languages that have a PO file in tails.git + tier-1 languages + any language that was proposed in our last release. And then, manually clean up "any language that was proposed in our last release" from time to time.
  • C. Same as B except we automate the cleaning process as I've suggested.

I understand you're in favour of option A, and skip the added implementation complexity of options B and C, on the grounds that we can assume option A won't cause too much flapping (languages added/removed/added/etc.) that would be detrimental to users. I'm fine with it. If we notice it causes trouble later, we can come back to the drawing board. Deal?

#27 Updated by sajolida 22 days ago

I understand you're in favour of option A

I'm in favor of any of A, B, or C and will let me beloved coders choose
the one they prefer implementing, especially since "if we notice it
causes trouble later, we can come back to the drawing board" :)

#28 Updated by sajolida 22 days ago

  • Assignee deleted (sajolida)

Since we have a deal, this task is ready to be implemented (as soon as #16774 is over) and I can design it from me.

#29 Updated by intrigeri 22 days ago

  • Assignee set to intrigeri

Since we have a deal, this task is ready to be implemented (as soon as #16774 is over) and I can design it from me.

OK. I'll first update the ticket description, otherwise whoever tries to implements it will have a hard time understanding what they're supposed to do.

#30 Updated by sajolida 21 days ago

  • Related to Feature #17002: Do stats on the languages in which people start Tails added

#31 Updated by sajolida 21 days ago

Today I computed some stats on the languages in which people start Tails. See #17002.

I checked the coverage that our current proposal would have on actual sessions. And it's pretty good :)

Only counting non-English sessions, see spreadsheet in attachment:

(I'm assuming here that the 2.2% of non-English sessions in Low German are errors and would use German in a curated list, see #17002#note-3.)

  • Our tier-1 language would cover 90.91% of non-English sessions.
  • Our language with currenty a PO file in tails.git/po would cover 97.02% of non-English sessions.
  • Our language with currently a PO file in either tails.git, greeter.git, persistence-setup.git, or whisperback.git would cover 99.86% of non-English sessions.

Of the 0.14% left (over 84 days of logs), the top 3 are Estonian, Bosnian, and Belarusian. For each of these, only looking at the main menu, GNOME is partly translated but neither Tor Browser nor Thunderbird are translated at all.

I'll stop going down the list here and consider that this little analysis validates our criteria with hard data.

#32 Updated by intrigeri 21 days ago

  • Description updated (diff)
  • Assignee deleted (intrigeri)
  • Target version deleted (Tails_3.16)
  • Type of work changed from Research to Code

#33 Updated by intrigeri 3 days ago

  • Blocks Feature #16094: Have simplified and traditional Chinese in the list of languages in Tails Greeter added

Also available in: Atom PDF