Project

General

Profile

Feature #16095

Curate the list of languages in Tails Greeter

Added by sajolida about 1 year ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Internationalization
Target version:
Start date:
11/04/2018
Due date:
% Done:

100%

Feature Branch:
https://salsa.debian.org/tails-team/tails/merge_requests/39/commits
Type of work:
Code
Blueprint:
Starter:
Affected tool:
Greeter

Description

We currently have many languages or languages variants that are poorly or not translated at all (either in Debian and GNOME in general or for our internal tools). It's the case for example of "Chinese, mandarin" and probably many others.

Having such a long list makes it harder to know which languages are actually well translated and for the user to know what's her best option is, without trial and errors.

I think we should filter this list to only display the languages that are reasonably well translated.

Implementation-wise, the Greeter can do either:

  • A. Only propose languages that have a PO file in tails.git + tier-1 languages, period. This embeds a removal mechanism, by definition.
  • B. Propose languages that have a PO file in tails.git + tier-1 languages + any language that was proposed in our last release. And then, manually clean up "any language that was proposed in our last release" from time to time.
  • C. Same as B except we automate the cleaning process as intrigeri suggested in #16095#note-24

Option A is the cheapest and is probably good enough. If we notice trouble later (e.g. churn/flapping, i.e. languages appearing and disappearing every second release), we can come back to it and implement option B or C.

Then, we'll need to remove all language packs (e.g. Tor Browser's) that can't possibly be used (because they're not listed in the Greeter).

languages.ods (12.8 KB) sajolida, 08/28/2019 06:48 PM


Related issues

Related to Tails - Bug #16093: Remove untranslated Chinese languages from Tails Greeter Resolved 11/04/2018
Related to Tails - Feature #14544: Spend software developer time on smallish UX improvements In Progress 08/31/2018
Related to Tails - Feature #15807: Define & apply clear criteria for including dictionaries, fonts and language packs Resolved 08/18/2018
Related to Tails - Feature #9956: Consider replacing the additional fonts we ship with Noto Duplicate 08/09/2015
Related to Tails - Feature #7036: Move custom software to main git repo In Progress
Related to Tails - Feature #17002: Do stats on the languages in which people start Tails Resolved
Related to Tails - Feature #17089: Remove obscure keyboard layouts from Greeter Resolved
Related to Tails - Bug #16774: Transifex translations: we should not update from the _completed branches In Progress
Related to Tails - Bug #17139: Only ship locale definitions that the user can select in the Greeter Confirmed
Blocks Tails - Feature #16209: Core work: Foundations Team Confirmed
Blocks Tails - Feature #16688: Core work 2019Q3 → 2019Q4: User experience Confirmed
Blocks Tails - Feature #16094: Have simplified and traditional Chinese in the list of languages in Tails Greeter Resolved 11/04/2018
Blocks Tails - Bug #16806: Formats from Greeter not respected Resolved
Blocks Tails - Bug #17058: Translatable unused sentences in Greeter translation Resolved
Blocks Tails - Bug #17087: Greeter applies setting when user clicks "Cancel" or "Back" Resolved
Blocks Tails - Bug #13447: Greeter: Password check doesn't work when changing first entry Resolved 07/09/2017
Blocks Tails - Feature #17089: Remove obscure keyboard layouts from Greeter Resolved
Blocked by Tails - Bug #17106: Don't import PO files with no translated string from Transifex Resolved

Associated revisions

Revision 54ba104b (diff)
Added by segfault about 2 months ago

Curate list of languages in Tails Greeter (refs: #16095)

Only include languages which either:

  • Have a PO file in tails.git
  • Are on our list of tier 1 supported languages:

    Arabic - AR
    German - DE
    English - EN
    Spanish - ES
    Farsi - FA
    French - FR
    Hindi
    Indonesian
    Italian - IT
    Portuguese - PT-BR
    Russian - RU
    Turkish - TR
    Simplified Chinese - zh-CN

Revision 1f549ffc (diff)
Added by segfault about 2 months ago

Don't add languages from /usr/share/i18n/SUPPORTED to the Greeter (refs: #16095)

We only want to support a curated list of languages in the Greeter.

Revision abda6af3 (diff)
Added by segfault about 1 month ago

Generate the list of supported languages during build (refs: #16095)

Revision 17ab34aa (diff)
Added by segfault about 1 month ago

Bring back the list of default locales (refs: #16095)

Revision 049650b5 (diff)
Added by segfault about 1 month ago

Generate the list of supported locales during build (refs: #16095)

Revision c72390b6 (diff)
Added by segfault about 1 month ago

Use the generated list of supported locales in the Greeter (refs: #16095)

Revision cf95475b (diff)
Added by segfault about 1 month ago

Generate the list of supported languages during build (refs: #16095)

Revision 71e80e40 (diff)
Added by segfault about 1 month ago

Bring back the list of default locales (refs: #16095)

Revision 493357b4 (diff)
Added by segfault about 1 month ago

Generate the list of supported locales during build (refs: #16095)

Revision a62279b6 (diff)
Added by segfault about 1 month ago

Use the generated list of supported locales in the Greeter (refs: #16095)

Revision ad550f07 (diff)
Added by segfault about 1 month ago

Generate the list of supported languages during build (refs: #16095)

Revision 1dee6062 (diff)
Added by segfault about 1 month ago

Bring back the list of default locales (refs: #16095)

Revision fb5e6029 (diff)
Added by segfault about 1 month ago

Generate the list of supported locales during build (refs: #16095)

Revision b96a3aaa (diff)
Added by segfault about 1 month ago

Use the generated list of supported locales in the Greeter (refs: #16095)

Revision 257bd1f0 (diff)
Added by segfault about 1 month ago

Generate the list of supported languages during build (refs: #16095)

Revision 8874ddcf (diff)
Added by segfault about 1 month ago

Bring back the list of default locales (refs: #16095)

Revision 1bd1795e (diff)
Added by segfault about 1 month ago

Generate the list of supported locales during build (refs: #16095)

Revision 4ddc528c (diff)
Added by segfault about 1 month ago

Use the generated list of supported locales in the Greeter (refs: #16095)

Revision 53c6c19a
Added by intrigeri about 1 month ago

Merge branch 'feature/17098-refactor-greeter' into devel (Closes: #17098, #17089, #16806, #13447, #17087, #17058, #17101, #6525, #16095, #16094, #16093)

History

#1 Updated by sajolida about 1 year ago

  • Related to Bug #16093: Remove untranslated Chinese languages from Tails Greeter added

#2 Updated by sajolida about 1 year ago

  • Related to Feature #14544: Spend software developer time on smallish UX improvements added

#3 Updated by mercedes508 about 1 year ago

  • Status changed from New to Confirmed

#4 Updated by sajolida 8 months ago

  • Related to Feature #15807: Define & apply clear criteria for including dictionaries, fonts and language packs added

#5 Updated by intrigeri 8 months ago

We currently have many languages or languages variants that are poorly or not translated at all (either in Debian and GNOME in general or for our internal tools). It's the case for example of "Chinese, mandarin" and probably many others.

I'm fine with hiding options that would result in most of the desktop being in English, at least when there's a close enough alternative that's better translated (e.g. a close variant of the same language). When there's no such alternate option, I'm worried that removing the option implies the user will have no choice but using English, and I'm not sure if it's better or worse than, say, 90% English and 10% in their preferred language; I guess it's a matter of "at least some bits are translated and thus easier to understand i.e. better than nothing" vs. "context/languages switching has a cost and is evil".

If we base ourselves on the PO files for our internal tools, we might be able to automatically generate a list of languages during the build. Making sure that our internal tools are well translated in a given language before listing it sounds like a good criteria too.

I'd be curious to see what this heuristics yields and whether the resulting list indeed would be likely to help users pick an option that will work better for them.

#6 Updated by intrigeri 8 months ago

  • Description updated (diff)

#7 Updated by sajolida 8 months ago

Ok, so what's the next step here?

Is it to build a table with:

  • Language name, all the ones we have in Tails Greeter
  • Percentage of translation of: core GNOME? all packages included in Tails?
  • Percentage of translation of Tails software

So we can have concrete examples?

I would know how to compute the percentage of translations of Tails software but not the rest.

#8 Updated by intrigeri 8 months ago

#9 Updated by intrigeri 8 months ago

  • Type of work changed from Discuss to Research

Ok, so what's the next step here?
Is it to build a table with: […]

I would start with Tails software only: if this gives us good enough results, it's by far the cheapest to implement initially and to maintain on the long run.

#10 Updated by intrigeri 8 months ago

  • Related to Feature #9956: Consider replacing the additional fonts we ship with Noto added

#11 Updated by sajolida 8 months ago

  • Assignee set to sajolida
  • Target version set to Tails_3.15

Ok, next step is to build this list.

#12 Updated by sajolida 6 months ago

  • Blocks Feature #16688: Core work 2019Q3 → 2019Q4: User experience added

#13 Updated by sajolida 6 months ago

  • Target version changed from Tails_3.15 to Tails_3.16

#14 Updated by sajolida 3 months ago

Here is the list. I used as PO files:

  • greeter.git:po/
  • liveusb-creator.git:po/
  • tails.git:po/
  • persistence-setup.git:po/
  • whisperback.git:po/

The script is in ux.git:greeter/percentage-translated.rb.

The result is:

de           98.6
he           98.4
es           98.4
ca           98.4
tr           98.4
ka           98.4
ru           98.4
it           98.4
da           98.4
ro           98.4
zh_CN        98.4
is           98.3
pt_PT        98.3
fr           98.3
es_AR        98.2
en_GB        98.0
pl           98.0
ga           98.0
sv           97.8
el           97.4
zh_TW        93.4
fa           85.4
pt_BR        85.4
bn           85.2
bn_BD        84.1
pt           75.4
hr_HR        75.0
id           71.5
cs           71.1
nb           70.8
ko           70.5
bg           70.4
fr_CA        70.2
lv           70.2
lt           69.4
vi           68.3
uk           64.3
nl           61.6
hu           61.3
fi           56.5
ar           54.1
sq           49.8
zh_HK        49.0
sk_SK        49.0
ms_MY        45.2
sk           44.0
sl_SI        43.7
az           42.9
km           42.4
hr           37.5
ja           36.6
nl_BE        31.7
eu           31.5
nn           30.7
si_LK        30.2
gl           28.1
th           24.1
et           23.9
sr           21.8
lb           16.7
en           11.4
pl_PL         9.0
ta            8.1
hy            8.1
cy            8.0
ru@petr1708   7.2
nds           4.3
sr@latin      4.1
bs            3.3
ast           3.3
pa            3.3
bn_IN         2.1
ms            0.1
zh            0.0

I'd be interested in seeing a list of all current locale listed in Tails Greeter and study a bit the difference. I briefly tried to understand how the list was built in the code of Tails Greeter but failed.

Is it listing all locales in /usr/share/locale/?

#15 Updated by sajolida 3 months ago

  • Related to Feature #7036: Move custom software to main git repo added

#16 Updated by sajolida 3 months ago

We currently have 74 languages with at least some translation of our custom programs.

The full list of locales in /usr/share/locale/, which I suspect is the base for the listing in Tails Greeter, has 268 locales.

By only listing in Tails Greeter the locales for which we have at least 1 string of custom software translated, we would reduce the list of options by a factor of 4. Pretty good already!

Once our custom software are all in the main git repo (#7036), I guess that it will become even easier to compute this list at build time. For the time being, we could limit ourselves to the list of languages that are already in tails.git:po/, if that's easier.

I compared this list with the list of languages that have at least 1 string of custom software translated. This time ignoring liveusb-creator.git because some translations are probably inherited from upstream and might not even be relevant in the context of Tails. The missing languages are the following (ie. the languages that are in either greeter.git, persistence-setup.git, or whisperback.git):

ar
en
eu
gl
hr
hu
hy
ja
nl
nl_BE
pl_PL
ru@petr1708
si_LK
sk
ta
th

From these:

  • hr, pl_PL, and sk have another locale from the same language but another territory in tails.git.
  • Only ja, hu, and ar are translated at more than 25% across all repositories.

So, if my simplification is worth it in terms of integration code, I propose to only list in Tails Greeter, the languages that have a PO file in tails.git:po/. To add a new language in Tails Greeter, people should start translating for this language in tails.git.

@intrigeri: What do you think?

#17 Updated by intrigeri 3 months ago

I'd be interested in seeing a list of all current locale listed in Tails Greeter and study a bit the difference. I briefly tried to understand how the list was built in the code of Tails Greeter but failed. Is it listing all locales in /usr/share/locale/?

It's generated from /usr/share/i18n/SUPPORTED, which I guess is the list of locales supported by Debian.

#18 Updated by intrigeri 3 months ago

So, if my simplification is worth it in terms of integration code,

It would be easier to code, indeed. In any case, I would not want us to spend time now on writing code that will be obsoleted by #7036.

I propose to only list in Tails Greeter, the languages that have a PO file in tails.git:po/. To add a new language in Tails Greeter, people should start translating for this language in tails.git.

Currently tails.git:po/ only has languages for which 100% of these strings are translated (but emma peel proposed to change this on #16774). Also, the RM often has to remove invalid PO files from there during the release process, e.g. in May I removed: ar, fa, hu, ja, kk, nl, sk. The risk of such removal happening and its problematic impact will only increase with #7036. That's surely another problem, which we should probably think about somewhere else (and possibly as part of #7036), but if we're talking about "for the time being" we must consider this problem as very much alive.

So in the current state of things, your proposal is equivalent to: "in order to offer a language in Tails Greeter, people must translate 100% of the tails-misc resource on Transifex without doing any mistake that would make the resulting PO file invalid". This seems a very high bar to me. For example, with this heuristic, we would have removed the Japanese language option from the Greeter since 3.13.2, even though the GNOME desktop/apps and Tor Browser are almost entirely translated, which feels like a bad move to me.

Sorry for being a killjoy here, but I prefer giving you this info ASAP instead of waiting until I make time to reply with more constructive ideas.

#19 Updated by sajolida 3 months ago

It's generated from /usr/share/i18n/SUPPORTED, which I guess is the list of locales supported by Debian.

Thanks!

So that would be a list of 284 locales (instead of 268 as I thought
initially):

sed -r 's/^([^\.@ ]+).+/\1/' /usr/share/i18n/SUPPORTED | sort -u | wc -l

#20 Updated by sajolida 3 months ago

Currently tails.git:po/ only has languages for which 100% of these strings are translated (but emma peel proposed to change this on #16774).

I see something different in tails.git. For example:

  • po/zh.po is translated at 0% and after a quick scan I don't think
    that it has ever seen a translation.
  • po/cy.po is translated at 22%, pt at 38%, es_AR at 99%, etc.

But it sounds like a good idea anyway to block this ticket on #16774 and
there seems to be a solution for #16744 already :)

Also, the RM often has to remove invalid PO files from there during the release process, e.g. in May I removed: ar, fa, hu, ja, kk, nl, sk. The risk of such removal happening and its problematic impact will only increase with #7036. That's surely another problem, which we should probably think about somewhere else (and possibly as part of #7036), but if we're talking about "for the time being" we must consider this problem as very much alive.

Do you mean that if a glitch is introduced in fr.po between 3.15 and
3.16, we won't have any French translation of everything in po/tails.pot
in 3.16 and later until the glitch is fixed?

If so, could we consider asking RMs to instead either:

  • Remove the msgstr with the glitch but keep the rest of fr.po
  • Roll back fr.po to its version prior to the glitch

#21 Updated by sajolida 3 months ago

  • Blocked by Bug #16774: Transifex translations: we should not update from the _completed branches added

#22 Updated by intrigeri 3 months ago

Currently tails.git:po/ only has languages for which 100% of these strings are translated (but emma peel proposed to change this on #16774).

I see something different in tails.git. For example:

  • po/zh.po is translated at 0% and after a quick scan I don't think that it has ever seen a translation.

That one does not come from Transifex. It's a leftover from some old stuff of ours ⇒ deleted!

  • po/cy.po is translated at 22%, pt at 38%, es_AR at 99%, etc.

I suspect these ones are all caused by a bug in the script Tor runs to import translations from Transifex into Git. On top of that, our own import-translations script is not deleting PO files that were removed from the Tor Git repo we copy them from. So there are two bugs that make the current behaviour not match what the intent is/was.

Also, the RM often has to remove invalid PO files from there during the release process, e.g. in May I removed: ar, fa, hu, ja, kk, nl, sk. The risk of such removal happening and its problematic impact will only increase with #7036. That's surely another problem, which we should probably think about somewhere else (and possibly as part of #7036), but if we're talking about "for the time being" we must consider this problem as very much alive.

Do you mean that if a glitch is introduced in fr.po between 3.15 and 3.16, we won't have any French translation of everything in po/tails.pot in 3.16 and later until the glitch is fixed?

Exactly.

If so, could we consider asking RMs to instead either:

  • Remove the msgstr with the glitch but keep the rest of fr.po

I'd rather not to.

  • Roll back fr.po to its version prior to the glitch

Sure, that's cheap ⇒ done with 256a340a6627c9d1d3009bd96300bc88aa7c129b.

#23 Updated by sajolida 3 months ago

  • Roll back fr.po to its version prior to the glitch

Sure, that's cheap ⇒ done with 256a340a6627c9d1d3009bd96300bc88aa7c129b.

Cool!

So the next steps are:

  • Work on #16774, which includes agreeing on a translation threshold
    level for inclusion of languages.
  • Finish agreeing on the mechanism I'm proposing here: include in Tails
    Greeter only the locale that have a translation file in tails.git:po.

#24 Updated by intrigeri 3 months ago

So the next steps are:

  • Work on #16774, which includes agreeing on a translation threshold level for inclusion of languages.

Done, left to be implemented (emmapeel for the Tor bits + myself for the Tails ones).

  • Finish agreeing on the mechanism I'm proposing here: include in Tails Greeter only the locale that have a translation file in tails.git:po.

I agree with the general idea but I think some refining is needed.

As data points, with the threshold we picked, in the current state of the tails-misc resource:

  • we would have: ca, cs, de, el, es_AR, es, fi, fr, ga, he, hu, it, km, lt, pt_BR, pt_PT, ro, sv, tr, zh_CN
  • we would lose: az, bg, bn_BD, bn, cy, da, en_GB, fa, fr_CA, hr_HR, id, is, ka, ko, lv, ms_MY, nb, nn, pl, pt, ru, sk_SK, sl_SI, sq, sr, uk, vi, zh_HK, zh_TW

This will change over time, as we merge custom software into tails.git.

This is related to the discussion we had on #15807, where we agreed about this list of tier-1 supported languages:

  • Arabic - AR
  • German - DE
  • English - EN
  • Spanish - ES
  • Farsi - FA
  • French - FR
  • Hindi
  • Indonesian
  • Italian - IT
  • Portuguese - PT-BR
  • Russian - RU
  • Turkish - TR
  • Simplified Chinese - zh-CN

It would not make sense to me to hide, from the Greeter, languages that are on this list, and for which we ship language packs etc. So even if we don't have (enough reviewed) translations for these languages, IMO we should always propose them in the Greeter. In the current state of our translations, it makes a difference for Arabic, Hindi, Indonesian, and Russian.

Finally, I'm worried about the UX churn aspect of "include in Tails Greeter only the locale that have a translation file in tails.git:po". The extreme (admittedly theoretical) example is: if my language is translated around 25% on Transifex, it could be proposed in Tails 4.0, not proposed in 4.1, proposed again in 4.2, etc. More generally, training users to choose their preferred language in the Greeter, and then dropping it, may feel like a UX regression to them: even if our custom software is not translated well enough anymore for us to ship the corresponding translations, it may be that GNOME and Tor Browser are well translated into that language. So what about this: we use the mechanism you propose (+ tier-1 supported languages) to add languages to the Greeter, but not to remove them immediately as soon as they drop below the threshold. Instead, we could remove non-tier-1 languages from the Greeter only once we've not had PO files in tails.git:po for them since 6-12 months. The mechanism we have to update these PO files could update whatever file will be used to generate the list of languages visible in the Greeter, and in there, maintain a "last seen" timestamp; then when we generate the list of languages visible in the Greeter, we ignore those for which the "last seen" timestamp is too old, unless it's a tier-1 supported language.

What do you think?

#25 Updated by sajolida 3 months ago

IMO we should always propose [tier-1 languages] in the Greeter.

+1

we use the mechanism you propose (+ tier-1 supported languages) to add languages to the Greeter

+1

Instead, we could remove non-tier-1 languages from the Greeter only once we've not had PO files in tails.git:po for them since 6-12 months.

Why not. I believe that my proposal to use a 0% threshold on
#16774#note-16 would achieve the same in practice.

#26 Updated by intrigeri 3 months ago

Instead, we could remove non-tier-1 languages from the Greeter only once we've not had PO files in tails.git:po for them since 6-12 months.

Why not. I believe that my proposal to use a 0% threshold on #16774#note-16 would achieve the same in practice.

I understand this reasoning relies on the implicit assumption that with:

  • X = the probability that a given language's percentage of reviewed strings varies between zero and a strictly positive number
  • Y = the probability that a given language's percentage of reviewed strings varies around 25% (sometimes strictly less, sometimes more)

… X << Y, and X is sufficiently small that we can live with the risk of problematic UX consequences.

Intuitively, this sounds reasonable to me: I guess there's little chances that translators translate & review precisely the strings that are going to be obsoleted soon after, and only these ones, hence dropping to 0%, getting removed, then they translate & review a couple strings again and their language comes back.

To sum up, wrt. removing languages from the Greeter, we have three options:

  • A. Only propose languages that have a PO file in tails.git + tier-1 languages, period. This embeds a removal mechanism, by definition.
  • B. Propose languages that have a PO file in tails.git + tier-1 languages + any language that was proposed in our last release. And then, manually clean up "any language that was proposed in our last release" from time to time.
  • C. Same as B except we automate the cleaning process as I've suggested.

I understand you're in favour of option A, and skip the added implementation complexity of options B and C, on the grounds that we can assume option A won't cause too much flapping (languages added/removed/added/etc.) that would be detrimental to users. I'm fine with it. If we notice it causes trouble later, we can come back to the drawing board. Deal?

#27 Updated by sajolida 3 months ago

I understand you're in favour of option A

I'm in favor of any of A, B, or C and will let me beloved coders choose
the one they prefer implementing, especially since "if we notice it
causes trouble later, we can come back to the drawing board" :)

#28 Updated by sajolida 3 months ago

  • Assignee deleted (sajolida)

Since we have a deal, this task is ready to be implemented (as soon as #16774 is over) and I can design it from me.

#29 Updated by intrigeri 3 months ago

  • Assignee set to intrigeri

Since we have a deal, this task is ready to be implemented (as soon as #16774 is over) and I can design it from me.

OK. I'll first update the ticket description, otherwise whoever tries to implements it will have a hard time understanding what they're supposed to do.

#30 Updated by sajolida 3 months ago

  • Related to Feature #17002: Do stats on the languages in which people start Tails added

#31 Updated by sajolida 3 months ago

Today I computed some stats on the languages in which people start Tails. See #17002.

I checked the coverage that our current proposal would have on actual sessions. And it's pretty good :)

Only counting non-English sessions, see spreadsheet in attachment:

(I'm assuming here that the 2.2% of non-English sessions in Low German are errors and would use German in a curated list, see #17002#note-3.)

  • Our tier-1 language would cover 90.91% of non-English sessions.
  • Our language with currenty a PO file in tails.git/po would cover 97.02% of non-English sessions.
  • Our language with currently a PO file in either tails.git, greeter.git, persistence-setup.git, or whisperback.git would cover 99.86% of non-English sessions.

Of the 0.14% left (over 84 days of logs), the top 3 are Estonian, Bosnian, and Belarusian. For each of these, only looking at the main menu, GNOME is partly translated but neither Tor Browser nor Thunderbird are translated at all.

I'll stop going down the list here and consider that this little analysis validates our criteria with hard data.

#32 Updated by intrigeri 3 months ago

  • Description updated (diff)
  • Assignee deleted (intrigeri)
  • Target version deleted (Tails_3.16)
  • Type of work changed from Research to Code

#33 Updated by intrigeri 2 months ago

  • Blocks Feature #16094: Have simplified and traditional Chinese in the list of languages in Tails Greeter added

#34 Updated by segfault about 2 months ago

  • Status changed from Confirmed to In Progress

#35 Updated by segfault about 2 months ago

  • Status changed from In Progress to Needs Validation
  • Feature Branch set to feature/16095-curate-languages-in-greeter

I curated config/chroot_local-includes/usr/share/tails-greeter/default_langcodes according to Option A in the description.

I would also like to curate the list of keyboard layouts, which is huge and, since based on Buster, contains things like "Czech Slovak and German" - which is the first result when you filter the layouts for "german" (there is also "Czech" and "German", so I really don't see the point of this layout).

#36 Updated by segfault about 2 months ago

I would also like to curate the list of keyboard layouts, which is huge and, since based on Buster, contains things like "Czech Slovak and German" - which is the first result when you filter the layouts for "german" (there is also "Czech" and "German", so I really don't see the point of this layout).

See #17089.

#37 Updated by segfault about 2 months ago

  • Related to Feature #17089: Remove obscure keyboard layouts from Greeter added

#38 Updated by segfault about 2 months ago

  • Status changed from Needs Validation to In Progress

#39 Updated by segfault about 2 months ago

  • Status changed from In Progress to Needs Validation

#40 Updated by segfault about 2 months ago

  • Target version set to Tails_4.0

#41 Updated by intrigeri about 2 months ago

  • Assignee set to intrigeri

#42 Updated by intrigeri about 2 months ago

  • Status changed from Needs Validation to In Progress
  • Assignee changed from intrigeri to segfault

I curated config/chroot_local-includes/usr/share/tails-greeter/default_langcodes according to Option A in the description.

Awesome!

I realize I've been unclear: the idea was to propose tier-1 languages + languages that have a PO file in tails.git at the time the Tails image is built. Hard-coding this list based on today's state of translations has two problems:

  • We're in a time of flux: we just changed the criteria for inclusion of PO files in tails.git and I expect this will affect the list of languages that make it there quite a bit in the next few months, as translators realize what the updated expectations are (tl;dr: reviewed translations).
  • Sooner or later, some of the languages you've hard-coded won't meet the criteria that lead us to choose "PO file in tails.git" as a classifier anymore. And vice versa. So we would need a process to regularly update the list of languages. I'd rather automate it :)

Does this make sense to you?
Do you want to implement the automation I'm proposing?

Apart of that:

  • I've pushed a few minor fixes to the topic branch.
  • I've merged current devel (that has #17082 merged in).
  • I'll comment on the relevant tickets (that share this topic branch) as I make progress in my code review and manual testing.

#43 Updated by intrigeri about 2 months ago

  • Blocks Bug #16806: Formats from Greeter not respected added

#44 Updated by intrigeri about 2 months ago

  • Blocks Bug #17058: Translatable unused sentences in Greeter translation added

#45 Updated by intrigeri about 2 months ago

  • Blocks Bug #17087: Greeter applies setting when user clicks "Cancel" or "Back" added

#46 Updated by intrigeri about 2 months ago

  • Blocks Bug #13447: Greeter: Password check doesn't work when changing first entry added

#47 Updated by intrigeri about 2 months ago

  • Blocks Feature #17089: Remove obscure keyboard layouts from Greeter added

#48 Updated by intrigeri about 2 months ago

  • Blocked by Bug #17106: Don't import PO files with no translated string from Transifex added

#49 Updated by intrigeri about 2 months ago

  • Blocked by deleted (Bug #16774: Transifex translations: we should not update from the _completed branches)

#50 Updated by intrigeri about 2 months ago

  • Related to Bug #16774: Transifex translations: we should not update from the _completed branches added

#51 Updated by segfault about 2 months ago

intrigeri wrote:

I curated config/chroot_local-includes/usr/share/tails-greeter/default_langcodes according to Option A in the description.

Awesome!

I realize I've been unclear: the idea was to propose tier-1 languages + languages that have a PO file in tails.git at the time the Tails image is built. Hard-coding this list based on today's state of translations has two problems:

  • We're in a time of flux: we just changed the criteria for inclusion of PO files in tails.git and I expect this will affect the list of languages that make it there quite a bit in the next few months, as translators realize what the updated expectations are (tl;dr: reviewed translations).
  • Sooner or later, some of the languages you've hard-coded won't meet the criteria that lead us to choose "PO file in tails.git" as a classifier anymore. And vice versa. So we would need a process to regularly update the list of languages. I'd rather automate it :)

Does this make sense to you?
Do you want to implement the automation I'm proposing?

Sure, makes sense and I would like to implement it. But the translations in po/ are mostly only for a language, not a full locale - and I don't know how to automatically get a locale from a language code without implementing a manual mapping from language code to country. I searched quite a lot but couldn't find anything. Neither the Python locale module nor GNOME seems to support that.

If we can't do this automatically and have to maintain a mapping from language to locale, I don't think that would be much better than the manually maintained list of locales I added.

#52 Updated by intrigeri about 2 months ago

Hi!

Sure, makes sense and I would like to implement it.

:)))

But the translations in po/ are mostly only for a language, not a full locale - and I don't know how to automatically get a locale from a language code without implementing a manual mapping from language code to country.

The options I can think of are:

  • If we have a LL.po file, then we display all LL_* locales in the Greeter.
    • Pros: probably straightforward to implement with basic regexp skills, no ongoing maintenance cost.
    • Cons: we don't curate the list of languages as much as we would like to.
  • Decide ourselves manually which is the primary country for language "LL", that is, which region should be picked when one chooses "LL". This is the "manual mapping from language code to country".
    • Pros:
      • We FTBFS when we have no mapping for a given newly introduced language, which avoids the risk that one forgets to add it to the list of locales.
      • Given this is a political decision, it's good that we have control over it (for example, currently for Portuguese we default to Portugal, while there are way more people speaking Portuguese in Brazil than in Portugal; defaulting to Portugal reinforces colonialist heritage, which IMO is not ideal ethically speaking).
    • Cons: whenever a new LL.po appears, the RM has to make this decision and add a line to a config file. This happens in the critical path to a release, so it's stressful, which is not the ideal context to make good decisions.
  • Have some code decide which is the primary country for language "LL", that is, which region should be picked when one chooses "LL". I understand this is what you researched and did not find a solution for. But the current Greeter apparently is able to make this decision already: for example, if I choose "Deutsch (German)" without specifying a region, the Greeter somehow decides that I want "Deutsch - Deutschland (German - Germany)". Is there any reason why we can't reuse the same code here?
    • Pros: we already have most of the needed code; no ongoing maintenance cost
    • Cons: we don't get (for free) manual control over such political decisions

If we can't do this automatically and have to maintain a mapping from language to locale, I don't see how that would be any better than the manually maintained list of locales I added.

That's the 2nd option I've described above. IMO it's still better than a manually maintained list of locales, because:

  • If a new LL.po appears, it's obvious that an action is needed: we FTBFS. While with the manually maintained list of locales, we rely on the RM to not skip a manual step of the release process. Experience shows that RMs are prone to skipping random steps, be it because of a mistaken judgement call ("probably not needed this time") or of a mere oversight.
  • If a LL.po is removed, then the corresponding language gets automatically removed from the Greeter, which is the idea here. Of course, we could ask the RM to do it, but again, that's unreliable.

#53 Updated by segfault about 2 months ago

intrigeri wrote:

But the translations in po/ are mostly only for a language, not a full locale - and I don't know how to automatically get a locale from a language code without implementing a manual mapping from language code to country.

The options I can think of are:

  • If we have a LL.po file, then we display all LL_* locales in the Greeter.
    • Pros: probably straightforward to implement with basic regexp skills, no ongoing maintenance cost.
    • Cons: we don't curate the list of languages as much as we would like to.

This could actually be a good solution. According to the ticket's description, the motivation for curating the list is to remove languages which are not well translated. This proposal wouldn't add languages which are not well translated, because all the LL_* locales should be translated by the LL.po file.

  • Decide ourselves manually which is the primary country for language "LL", that is, which region should be picked when one chooses "LL". This is the "manual mapping from language code to country".
    • Pros:
      • We FTBFS when we have no mapping for a given newly introduced language, which avoids the risk that one forgets to add it to the list of locales.
      • Given this is a political decision, it's good that we have control over it (for example, currently for Portuguese we default to Portugal, while there are way more people speaking Portuguese in Brazil than in Portugal; defaulting to Portugal reinforces colonialist heritage, which IMO is not ideal ethically speaking).
    • Cons: whenever a new LL.po appears, the RM has to make this decision and add a line to a config file. This happens in the critical path to a release, so it's stressful, which is not the ideal context to make good decisions.
  • Have some code decide which is the primary country for language "LL", that is, which region should be picked when one chooses "LL". I understand this is what you researched and did not find a solution for. But the current Greeter apparently is able to make this decision already: for example, if I choose "Deutsch (German)" without specifying a region, the Greeter somehow decides that I want "Deutsch - Deutschland (German - Germany)". Is there any reason why we can't reuse the same code here?
    • Pros: we already have most of the needed code; no ongoing maintenance cost
    • Cons: we don't get (for free) manual control over such political decisions

The greeter's algorithm for this is not very sophisticated. It just tests whether there is locale with the same country code as the language code, for example pt_PT, and then uses that. Else it uses the first pt_* locale in its list of locales - and this list begins with a hardcoded list of default locales (config/chroot_local-includes/usr/share/tails-greeter/default_langcodes).

So this algorithm has the issue that (1.) it doesn't allow good manual control over political decisions, and (2.) it also requires a manually maintained list of locales to give useful results (without this list of defaults, for example the locale chosen for en would be en_AG).

If you want to take a look, the function which implements this is get_default_locale in config/chroot_local-includes/usr/lib/python3/dist-packages/tailsgreeter/language.py.

If we can't do this automatically and have to maintain a mapping from language to locale, I don't see how that would be any better than the manually maintained list of locales I added.

That's the 2nd option I've described above. IMO it's still better than a manually maintained list of locales, because:

  • If a new LL.po appears, it's obvious that an action is needed: we FTBFS. While with the manually maintained list of locales, we rely on the RM to not skip a manual step of the release process. Experience shows that RMs are prone to skipping random steps, be it because of a mistaken judgement call ("probably not needed this time") or of a mere oversight.
  • If a LL.po is removed, then the corresponding language gets automatically removed from the Greeter, which is the idea here. Of course, we could ask the RM to do it, but again, that's unreliable.

I agree. That's why I edited the quoted sentence after posting the comment :)

My favorite is your 1st option. But if we decide against that, I propose that we do the following:
  • Manually maintain a mapping from language code to country, and use the already existing list in config/chroot_local-includes/usr/share/tails-greeter/default_langcodes as the basis.
  • During build time, generate a list from tier-1 languages and languages in po/
  • In the greeter, compile the list of locales from the list above. For each entry in that list which does not already contain a country code, use the mapping to obtain the country code.
  • We can still decide whether we want to FTBFS if the language list contains a language which has no mapping, or whether the Greeter should choose a country instead.

#54 Updated by intrigeri about 2 months ago

My favorite is your 1st option.

I say go for it: it should be cheap and is a reasonable step in the spirit of the goals set on this ticket. Also, it has a good chance to land into 4.0.
Then we can ask sajolida to try it out and tell us if he would like us to curate the list further, possibly in some future release, and then we can consider the marginal cost/benefit of this next iteration. Unless, of course, he states his opinion here before you code the 1st option.

#55 Updated by segfault about 1 month ago

I implemented it, but IMO we still have the need for specifying a default country per language:

In the Greeter the languages are grouped into one entry. If the user clicks on that entry instead of expanding it, the Greeter uses the algorithm I described above to choose the default language. But since we don't add the config/chroot_local-includes/usr/share/tails-greeter/default_langcodes to the beginning of the supported locales list, it will only try to use the country with the same code as the language, and if there is none it will choose the first country alphabetically.

The effect is that, when clicking English, en_AB is used (Antigua and Barbuda), when clicking Arabic it's ar_AE (United Arab Emirates), which both seem like bad choices.

To fix this, we could use the old default_langcodes file as the basis for our new locales list. I will implement something.

#56 Updated by intrigeri about 1 month ago

To fix this, we could use the old default_langcodes file as the basis for our new locales list (locales). I will implement something.

Yeah, in doubt, I think it's OK to lean towards preserving the current behavior, that does not seem problematic in itself.

#57 Updated by segfault about 1 month ago

  • Status changed from In Progress to Needs Validation
  • Feature Branch changed from feature/16095-curate-languages-in-greeter to feature/17098-refactor-greeter

Done on top of the #17098 branch, to prevent merge conflicts.

#58 Updated by segfault about 1 month ago

  • Assignee deleted (segfault)

#59 Updated by segfault about 1 month ago

  • Status changed from Needs Validation to In Progress

#60 Updated by segfault about 1 month ago

  • Status changed from In Progress to Needs Validation

#61 Updated by segfault about 1 month ago

  • Status changed from Needs Validation to In Progress

#62 Updated by intrigeri about 1 month ago

Status changed from Needs Validation to In Progress

Was this intentional?

#63 Updated by segfault about 1 month ago

  • Status changed from In Progress to Needs Validation

intrigeri wrote:

Status changed from Needs Validation to In Progress

Was this intentional?

No it was not.

#64 Updated by segfault about 1 month ago

  • Status changed from Needs Validation to In Progress

#65 Updated by segfault about 1 month ago

  • Status changed from In Progress to Needs Validation
  • Feature Branch changed from feature/17098-refactor-greeter to https://salsa.debian.org/tails-team/tails/merge_requests/39/commits

#66 Updated by intrigeri about 1 month ago

  • Assignee set to intrigeri

#67 Updated by intrigeri about 1 month ago

I'll run the full test suite locally, as some of our automated tests that exercise the Greeter are tagged fragile, and thus not run for this branch on Jenkins.

#68 Updated by intrigeri about 1 month ago

  • Status changed from Needs Validation to Resolved
  • % Done changed from 0 to 100

#69 Updated by sajolida about 1 month ago

Yay! You rock!

#70 Updated by intrigeri about 1 month ago

  • Related to Bug #17139: Only ship locale definitions that the user can select in the Greeter added

Also available in: Atom PDF