Project

General

Profile

Bug #15403

Feature #10034: Translation web platform

Feature #15079: Integrate the platform with our Git and ikiwiki infrastructure

Unify po headers

Added by u over 1 year ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
08/19/2018
Due date:
% Done:

0%

Feature Branch:
https://salsa.debian.org/tails-team/tails/merge_requests/14
Type of work:
Code
Blueprint:
Starter:
Affected tool:

Description

The script is placed here (currently):
wiki/src/contribute/l10n_tricks/unify_po-headers.py

old script for reference: wiki/src/contribute/l10n_tricks/check_po.sh

Environment

  • at developers computer for checking/unifing their po files.
  • at Jenkins to check all po files. wiki/src po files have other rules than other po files.
  • a malicious users try to trick this script
  • as core devs need to review random review requests against tails repository, the scrip itself can't live in the same repository nor submodule, as otherwise a malicious users can get into control of a core devs' computer.

Expectations

  • never screws up the host system
  • status code is reflecting if any error was detected in po files
  • it does not get triggerd by freshly created/updated po files by ikiwiki (Author: IkiWiki <ikiwiki.info>)
  • only modify any file, when --modify is given.
  • output should be clear enough for humans to understand, where an issue was detected.

flake8.diff View (7.21 KB) enrico, 03/21/2019 04:32 PM


Related issues

Related to Tails - Feature #15364: Create .gitattributes with merge strategy for po files Rejected 03/02/2018
Related to Tails - Feature #16102: List of potential checks we might want to do on PO files Resolved 11/05/2018
Related to Tails - Bug #15819: Make sure ikiwiki generated PO files satisfy the requirements we want check_po to set Resolved 08/19/2018
Blocked by Tails - Bug #15605: Make check_po optionally accept a list of files Resolved 05/17/2018

Associated revisions

Revision 3c93054f (diff)
Added by u about 1 year ago

Let's use this check only on files that are part of wiki/, ie. our documentation. will-fix: #15403.

Revision 52afa659
Added by intrigeri 5 months ago

Merge remote-tracking branch 'origin/hefee/feature/15403-unify-po-headers'

refs: #15403, #15605

History

#1 Updated by hefee over 1 year ago

I see several headers we should unify:

"Project-Id-Version: \n"
"Language: ${LANG}\n"
"Last-Translator: Tails translators\n"
"Language-Team: Tails translators <>\n"

#2 Updated by hefee over 1 year ago

# unfortunately the syntax of po headers can expand to different lines,
# that's why we need to treat sed to parse correctly multilines.
# the way we use it is to merge all lines together (1h;1!H) and then search&replace within the whole string
# another option would be to call python/perl to get a regex that is easier to read.
# python: '^.*<Key>:(.*\n)*.*\\n.*$'

for filename in *.po; do
        # unify Language header
        EXTENSION="${filename#*.}" 
        LANG="${EXTENSION%.*}" 
        sed -i -e "s/^\"Language: .*$/\"Language: ${LANG}\\\n\"/" $filename

        #unify Project-Id-Version
        sed -i -n '1h;1!H;${;g;s/[^\n]*Project-Id-Version: [^\\]*\\n[^\n]*/"Project-Id-Version: \\n"/g;p;}' $filename

        #unify Language-Team
        sed -i -n '1h;1!H;${;g;s/[^\n]*Language-Team: [^\\]*\\n[^\n]*/"Language-Team: Tails translators <tails-l10n@boum.org>\\n"/g;p;}' $filename

        #unify Last-Translator
        sed -i -n '1h;1!H;${;g;s/[^\n]*Last-Translator: [^\\]*\\n[^\n]*/"Last-Translator: Tails translators\\n"/g;p;}' $filename
done

#3 Updated by u over 1 year ago

Today I managed to look a little at the merge conflicts on the weblate repo. And here is what I found - and we should try to fix this:

Weblate added this header:

+"X-Generator: Weblate 2.10.1\n" 

Apparently Poedit also add such a header, so maybe we can ignore this. But it might be that conflicts arise here, and I wonder if we should rather try to delete this header automatically?

Weblate writes plural headers differently (bottom line is Weblate line)

-"Plural-Forms: nplurals=2; plural=(n != 1);\n" 
+"Plural-Forms: nplurals=2; plural=n != 1;\n" 

I'm unsure what to do about this. Eventually we can configure this somewhere in Weblate and then we'll just use the version with the parentheses?

A merge conflict happened on this header:

"PO-Revision-Date: 2018-04-02 10:20+0000\n" 

It should be doable to verify which date is more recent and use this one, what do you think?

#4 Updated by bertagaz over 1 year ago

  • Target version changed from Tails_3.7 to Tails_3.8

#5 Updated by u over 1 year ago

  • Related to Feature #15364: Create .gitattributes with merge strategy for po files added

#6 Updated by u over 1 year ago

  • Status changed from Confirmed to In Progress

#7 Updated by u over 1 year ago

  • Feature Branch set to tails.git:translation_platform_hooks

#8 Updated by u over 1 year ago

  • Related to Bug #15605: Make check_po optionally accept a list of files added

#9 Updated by hefee over 1 year ago

  • % Done changed from 0 to 60

Now available under wiki/src/contribute/l10n_tricks/unify_po-headers.sh
see my fork at:
https://0xacab.org/Hefee/tails/commit/cf132794b22763cc6fc82dae3934f70a3ace2bc8

#10 Updated by u over 1 year ago

  • QA Check set to Pass

This file has been added to tails.git:translation_platform_hooks and improved upon (added 79 char limit).

# Rewrap po file to 79 chars
find -wholename ./tmp -prune -o \( -iname "$FILE_GLOB" -print0 \) \
        | xargs -0 --max-procs="$CPUS" --max-args=64 -I {} \
        msgcat -w 79 "{}" -o "{}" 

I've tested it and it works well.

#11 Updated by u over 1 year ago

I'd like to merge this into master to give everybody the tools they need to comply with our po rules asap.
However, I've received no report back at all concerning the testing of our Git hook. I think we need to wait just a bit longer, I'll resend an email and will set a deadlne in 10 days.

#12 Updated by u over 1 year ago

  • Assignee changed from hefee to u

Let's wait until Friday 29th 2018, 23:59h Berlin time.

#13 Updated by intrigeri over 1 year ago

I'd like to merge this into master to give everybody the tools they need to comply with our po rules asap.
However, I've received no report back at all concerning the testing of our Git hook.

I didn't test it but I offered to do a code review.

#14 Updated by intrigeri over 1 year ago

I offered to do a code review.

(Or at least I thought I did. Well, if I didn't, now I'm offering this.)

#15 Updated by u over 1 year ago

  • Assignee changed from u to intrigeri
  • QA Check changed from Pass to Ready for QA

Hi intrigeri, another code review is welcome. Please reassign to me when done.

#16 Updated by intrigeri over 1 year ago

  • Target version changed from Tails_3.8 to Tails_3.9

#17 Updated by intrigeri over 1 year ago

  • Target version changed from Tails_3.9 to Tails_3.8

I suppose it would smooth your work if this gets merged in time for 3.8. So I'll try to review and hopefully merge this despite the tight timing.

#18 Updated by intrigeri over 1 year ago

  • Related to deleted (Bug #15605: Make check_po optionally accept a list of files)

#19 Updated by intrigeri over 1 year ago

  • Blocked by Bug #15605: Make check_po optionally accept a list of files added

#20 Updated by intrigeri over 1 year ago

  • Assignee changed from intrigeri to u
  • QA Check changed from Ready for QA to Dev Needed

If I merge this as-is, then the new checks added to check_po.sh will break various CI jobs because the PO files on this branch don't satisfy the new requirements introduced by this branch. So I can't merge this myself before merging into master you'll need to:

  • fix the PO files on the topic branch
  • merge the topic branch of jenkins-tools once I'm done with my review on #15605
  • update the jenkins-tools submodule to its latest version (#15605#note-14) on the topic branch; beware, I've pushed a trivial typo fix on top of your branch there

One related question: does ikiwiki itself generate PO files that satisfy the new requirements introduced by this branch? I hope so but if not, then we have a problem: every time ikiwiki on our live website updates/creates a PO file we will get test suite failures.

Code review:

  • I don't get why you pass --exit-code to git diff: the exit code seems to be ignored (as long as you don't use set -e, see below).
  • POLIST is a bit too generic a name. What about STAGED_PO_FILES?
  • Please quote ${path}/./submodules/jenkins-tools/slaves/check_po: otherwise, if one has spaces in $path things will break. For the same reason, perhaps the {} placeholder in the sed commands should be quoted. Thankfully it's easy to test if it causes trouble in its current state with a tails.git working copy in a directory that has spaces in its path.
  • I'm not sure about the whitespace check: it's run with exec so I don't get how it can block the commit. Am I missing something? Anyway, it's rather off-topic here so an option would be to skip this discussion now, remove this check for now so it does not block this ticket, and move that check to a new, dedicated ticket.
  • It would be nice to use set -e and set -u consistently, as we do in basically every shell script of ours.
  • DRY ("don't repeat yourself") in unify_po-headers.sh: there are 5 instances of the find -wholename … code
  • I don't speak sed, in particular when not used with its default regexp language. Generally we use --regexp-extended everywhere, which matches closely the regexp language supported by most programming languages. This is not a blocker but here we're introducing code that will be hard to maintain. On the mid-term I'd rather see this script rewritten in Python/Perl/Ruby.
  • Multiple typos in ">nother option would be to call python/perl to get a reex".

#21 Updated by intrigeri over 1 year ago

Also, why ./ in ${path}/./submodules?

#22 Updated by intrigeri over 1 year ago

  • Target version changed from Tails_3.8 to Tails_3.9

#23 Updated by u over 1 year ago

  • Assignee changed from u to hefee

Hi! can you please look at the review above and then reassign to me? thanks!

#24 Updated by u about 1 year ago

intrigeri wrote:

  • I don't get why you pass --exit-code to git diff: the exit code seems to be ignored (as long as you don't use set -e, see below).

Totally. Mistake. Deleted.

  • POLIST is a bit too generic a name. What about STAGED_PO_FILES?

Ack. Done.

  • Please quote ${path}/./submodules/jenkins-tools/slaves/check_po: otherwise, if one has spaces in $path things will break.

Ack.

  • I'm not sure about the whitespace check: it's run with exec so I don't get how it can block the commit. Am I missing something? Anyway, it's rather off-topic here so an option would be to skip this discussion now, remove this check for now so it does not block this ticket, and move that check to a new, dedicated ticket.

I think I initially added that because I thought if we give translators a pre-commit hook, we can as well tell them to check for whitespace errors. However, it would not block the commit and it actually has nothing to do here, so I deleted that now and will think about how to provide this best practise in a better way.

  • It would be nice to use set -e and set -u consistently, as we do in basically every shell script of ours.

I'll look that up and apply it accordingly.

#25 Updated by u about 1 year ago

  • It would be nice to use set -e and set -u consistently, as we do in basically every shell script of ours.

I'll look that up and apply it accordingly.

Done.

#26 Updated by hefee about 1 year ago

  • Assignee changed from hefee to u
  • QA Check changed from Dev Needed to Ready for QA
  • Feature Branch changed from tails.git:translation_platform_hooks to https://0xacab.org/Hefee/tails/tree/translation_platform_hooks

intrigeri wrote:

  • Please quote ${path}/./submodules/jenkins-tools/slaves/check_po: otherwise, if one has spaces in $path things will break. For the same reason, perhaps the {} placeholder in the sed commands should be quoted.

quoted {} placeholder in check_po

  • I don't speak sed, in particular when not used with its default regexp language. Generally we use --regexp-extended everywhere, which matches closely the regexp language supported by most programming languages. This is not a blocker but here we're introducing code that will be hard to maintain. On the mid-term I'd rather see this script rewritten in Python/Perl/Ruby.

rewrote unify_po in Python - I can simply rewrite the check_po also in Python, as I needed already the checking anyways for unify script.

  • Multiple typos in ">nother option would be to call python/perl to get a reex".

@u: look for typos / improve text in python script and afterwards assign to intrigeri, if you are happy.

#27 Updated by u about 1 year ago

  • Target version changed from Tails_3.9 to Tails_3.10.1

Won't be able to review this in the next 48 hours (release of 3.9)

#28 Updated by intrigeri about 1 year ago

  • Target version changed from Tails_3.10.1 to Tails_3.11

#29 Updated by u about 1 year ago

  • Multiple typos in ">nother option would be to call python/perl to get a reex".

@u: look for typos / improve text in python script and afterwards assign to intrigeri, if you are happy.

done in tails.git:translation_platform_hooks 838cb9ecb8

#30 Updated by u about 1 year ago

  • Assignee changed from u to hefee
  • QA Check changed from Ready for QA to Dev Needed

#31 Updated by hefee about 1 year ago

  • Related to Feature #16102: List of potential checks we might want to do on PO files added

#32 Updated by hefee about 1 year ago

  • Assignee changed from hefee to u
  • QA Check changed from Dev Needed to Ready for QA
  • Type of work changed from Sysadmin to Code

@u: Can you do another string check in wiki/src/contribute/l10n_tricks/unify_po-headers.py?
I added the check_po into same file, as we share a lot of code anyways.
So it make sense to keep this together in one file.

We also to find a solution:
  • where to place the script(jenkins-tools or tails).
  • how the script should be named/ if we want make it accessible with other names.
if this is decided:
  • a QA in terms of program logic.

#33 Updated by u about 1 year ago

  • Assignee changed from u to hefee
  • QA Check changed from Ready for QA to Pass

hefee wrote:

@u: Can you do another string check in wiki/src/contribute/l10n_tricks/unify_po-headers.py?
I added the check_po into same file, as we share a lot of code anyways.
So it make sense to keep this together in one file.

Cool!

I checked and did some small improvements in tails.git:translation_platform_hooks.

We also to find a solution:
  • where to place the script(jenkins-tools or tails).

I suggest to ask intrigeri for advice on this.

  • how the script should be named/ if we want make it accessible with other names.

Idem.

if this is decided:
  • a QA in terms of program logic.

groente should officially be the reviewer of this, but intrigeri might also want to take a look.

#34 Updated by hefee about 1 year ago

  • Assignee changed from hefee to groente
  • QA Check changed from Pass to Info Needed

(the HEAD is at https://0xacab.org/Hefee/tails/tree/translation_platform_hooks)

infomation I need are:

  • where to place the script(jenkins-tools or tails).
  • is the script good enough to replace check_po.sh?
  • where is check_po is used and what workflows I have to test?

Feel free to assign it to others, I you don't know the anwsers.

#35 Updated by groente about 1 year ago

  • Assignee changed from groente to hefee
  • QA Check changed from Info Needed to Dev Needed

hey hefee,

if you want to replace the existing check_po, that should be done in jenkins-tools, which is where the isobuilders/testers get it from. see puppet-tails:manifests/tester/check_po.pp

as for replacing, it doesn't look suitable as drop-in replacement: it takes different arguments and gives different output compared to check_po. did you plan on making some wrapper for backward compatibility?

in terms of code quality, the regexKey makes my head hurt, perhaps you're better off with a review by someone who's more fluent in python.

as for where check_po is actually used, i have no idea...

#36 Updated by intrigeri about 1 year ago

as for where check_po is actually used, i have no idea...

Off the top of my head:

I'm not aware of any other users but it's kinda become public API so there might be other consumers.

#37 Updated by emmapeel about 1 year ago

Headers
-------

I have been testing this script. As i see it, the default headers should try to comply a little more with the goal of the headers themselves, i.e. give good metadata about the file. So I propose:

  • "Project-Id-Version: Tails\n" instead of "Project-Id-Version: \n"
  • Add the "Report-Msgid-Bugs-To: \n" line to all the files
  • Ignore the headers that change each time, as the Generator one. Maybe it could be stripped automatically, but for example POedit will also create it.
  • Always keep the oldest 'POT-Creation-Date'
  • Check for "Content-Type: text/plain; charset=UTF-8\n" because it can give trouble otherwise

Wrapping
--------

Regarding the wrapping, I have had many changes after trying this script. But they were not only rewrapping long lines, but the short ones were differently rewrapped, as for example:

-msgstr "" 
-"Se você quer operar um espelho do Tails, escreva para <tails-" 
-"mirrors@boum.org>.\n" 
+msgstr "Se você quer operar um espelho do Tails, escreva para <tails-mirrors@boum.org>.\n" 

Also, the script honored the no-wrap flag some strings have (I wonder why they have that flag?) and so did also things like:

-"Se você é jornalista e quer escrever sobre Tails, ou se quer nos enviar " 
-"links para [[artigos publicados|press]] sobre Tails, escreva para\n" 
+"Se você é jornalista e quer escrever sobre Tails, ou se quer nos enviar links para [[artigos publicados|press]] sobre Tails, escreva para\n" 
 "<tails-press@boum.org>.\n" 

Is this intended? Will it be consistent along computers, or it will change depending on the local builder setup?

#38 Updated by hefee about 1 year ago

groente wrote:

if you want to replace the existing check_po, that should be done in jenkins-tools, which is where the isobuilders/testers get it from. see puppet-tails:manifests/tester/check_po.pp

okay I see.

as for replacing, it doesn't look suitable as drop-in replacement: it takes different arguments and gives different output compared to check_po. did you plan on making some wrapper for backward compatibility?

For me it is unclear what do you mean by backward compatibility. All exampels intrigeri and you mentioned used check_po without any arguments. This triggers check_po against *.po.
The output of check_po is not checked at all. Only the return value is important.

Please describe in more detail, what you have in mind by "backward compatibility".

in terms of code quality, the regexKey makes my head hurt, perhaps you're better off with a review by someone who's more fluent in python.

as for where check_po is actually used, i have no idea...

okay we have to find an other person for the review, maybe @kibi or @anonym, who are familiar with python., but not part of this team.

intrigeri wrote:

okay those also just use check_po without any arguments and no further processing of the output.

I'm not aware of any other users but it's kinda become public API so there might be other consumers.

Well @u had already informed the translators, that we will modify the check_po script and we got no response. And the current check_po just outputs the i18nspector stdout, we do the same but also output the additional checks.

#39 Updated by hefee about 1 year ago

emmapeel wrote:

Headers
-------

I have been testing this script. As i see it, the default headers should try to comply a little more with the goal of the headers themselves, i.e. give good metadata about the file. So I propose:

  • "Project-Id-Version: Tails\n" instead of "Project-Id-Version: \n"

make sense.

make sense, but this must be added by ikiwiki push too.

  • Ignore the headers that change each time, as the Generator one. Maybe it could be stripped automatically, but for example POedit will also create it.

what you mean be ignoring? stripping this out?

  • Always keep the oldest 'POT-Creation-Date'

nope the POT-Creation-Date is important to see, that a po file needs to be updated.

  • Check for "Content-Type: text/plain; charset=UTF-8\n" because it can give trouble otherwise

this is done already by i18nspector.

Regarding the wrapping, I have had many changes after trying this script. But they were not only rewrapping long lines, but the short ones were differently rewrapped, as for example:

[...]

Also, the script honored the no-wrap flag some strings have (I wonder why they have that flag?) and so did also things like:

[...]

Is this intended? Will it be consistent along computers, or it will change depending on the local builder setup?

It is intended in the first step, as we initally we need to make the line length consitent. In summery it boils down to, that the output of gettext (msgcat) is consitent. I made several tests with dockers with different version and couldn't find an issue so far.

#40 Updated by emmapeel about 1 year ago

hefee wrote:

  • Always keep the oldest 'POT-Creation-Date'

nope the POT-Creation-Date is important to see, that a po file needs to be updated.

There are two headers:

"POT-Creation-Date: 2018-11-12 11:37+CET\n"
"PO-Revision-Date: 2018-10-02 22:41+0000\n"

When it is updated it changes the second line, but not the first.

#41 Updated by u about 1 year ago

@emmapeel: what the script tries to do is to actually get rid or to unify these headers because that's where the merge conflicts occur. In particular the revision date is problematic as far as I remember. I also think that this header is not necessary as this information is stored in Git anyway. And Git would also handle updating of these files, without the need to read this header. What do you think? Or does Weblate/manual translation somehow rely on this?

#42 Updated by hefee about 1 year ago

u wrote:

@emmapeel: what the script tries to do is to actually get rid or to unify these headers because that's where the merge conflicts occur. In particular the revision date is problematic as far as I remember. I also think that this header is not necessary as this information is stored in Git anyway. And Git would also handle updating of these files, without the need to read this header. What do you think? Or does Weblate/manual translation somehow rely on this?

It makes no sense to unify these two headers, as they are meaningful! Keep in mind, that a lot of tools (ikiwiki, weblate,...) use gettext tools in the end. The gettext tools don't know something about git, they rely on those headers. Anyways a simple merge stratagy offered by git does not work in the context of PO files, as PO files are not line based. That's why I started to write a script to take care about a propper merge for PO files...

#43 Updated by hefee about 1 year ago

  • Assignee changed from hefee to u
  • QA Check changed from Dev Needed to Info Needed

groente wrote:

if you want to replace the existing check_po, that should be done in jenkins-tools, which is where the isobuilders/testers get it from. see puppet-tails:manifests/tester/check_po.pp

as for replacing, it doesn't look suitable as drop-in replacement: it takes different arguments and gives different output compared to check_po. did you plan on making some wrapper for backward compatibility?

in terms of code quality, the regexKey makes my head hurt, perhaps you're better off with a review by someone who's more fluent in python.

as for where check_po is actually used, i have no idea...

Current issues:
  • I'm missing information about what is the "backward compatibility", that intrigeri and groente is talking about. Is the behaviour run i18nspector for all *.po files, when no argument given to the new script enough to forfill this requirment?
  • Who can do the review? as groente feels himself not suitable to do the review...

@u can you help solving these issues?

#44 Updated by u 11 months ago

hefee wrote:

groente wrote:

if you want to replace the existing check_po, that should be done in jenkins-tools, which is where the isobuilders/testers get it from. see puppet-tails:manifests/tester/check_po.pp

as for replacing, it doesn't look suitable as drop-in replacement: it takes different arguments and gives different output compared to check_po. did you plan on making some wrapper for backward compatibility?

in terms of code quality, the regexKey makes my head hurt, perhaps you're better off with a review by someone who's more fluent in python.

as for where check_po is actually used, i have no idea...

Current issues:
  • I'm missing information about what is the "backward compatibility", that intrigeri and groente is talking about. Is the behaviour run i18nspector for all *.po files, when no argument given to the new script enough to forfill this requirment?

This needs to be checked with intrigeri, sorry I cannot help here.

  • Who can do the review? as groente feels himself not suitable to do the review...

Ok, I'm asking segfault, kibi, enrico. None of these people are part of our current budget plan, but as soon as it is approved, it should be possible to onboard them for a review.

@u can you help solving these issues?

#45 Updated by CyrilBrulebois 11 months ago

  • Target version changed from Tails_3.11 to Tails_3.12

#46 Updated by intrigeri 10 months ago

  • Blocks Feature #15082: Have the Weblate Git communicate with our main Git repository added

#47 Updated by intrigeri 10 months ago

  • Blocks deleted (Feature #15082: Have the Weblate Git communicate with our main Git repository)

#48 Updated by u 10 months ago

  • Assignee changed from u to hefee
  • QA Check deleted (Info Needed)

@hefee: I think lamby will be able to do this review (still waiting for his confirmation though).

Concerning backward compatibility, I guess you've had a chance to talk with intrigeri?

#49 Updated by hefee 10 months ago

Concerning backward compatibility, I guess you've had a chance to talk with intrigeri?

the CI need to be able to run check_po for all files in the tails repository. This is the compatibility we need to keep.

#50 Updated by anonym 10 months ago

  • Target version changed from Tails_3.12 to Tails_3.13

#51 Updated by u 10 months ago

@hefee, you can assign this for review to @enrico. Please make sure he has all the necessary information to do the review: git+branch, goals of the script and other information that might be useful. Thanks!

#52 Updated by hefee 9 months ago

  • Description updated (diff)
  • Assignee changed from hefee to enrico
  • QA Check set to Ready for QA

I described the environment in where we want to run the sccript so a review should now be possible. Please ask if anything is not clear.

#53 Updated by intrigeri 8 months ago

  • Description updated (diff)

#54 Updated by intrigeri 8 months ago

  • Related to Bug #15819: Make sure ikiwiki generated PO files satisfy the requirements we want check_po to set added

#55 Updated by CyrilBrulebois 8 months ago

  • Target version changed from Tails_3.13 to Tails_3.14

#56 Updated by enrico 8 months ago

Hello, here is my review:


I suggest a round of flake8:

flake8 unify_po-headers.py

Besides cosmetic issues, you get things like:

unify_po-headers.py:68:11: W605 invalid escape sequence '\s'

which hint at a missing 'r' on regexp strings.

Since some of the pep8 habits make it easier for me to read the code, I've had
a go at fixing them. I'm attaching a patch.


I suggest to wrap the main code in a function, and just call the function at
module-level, like:

def main():
    ...

if __name__ == '__main__':
    main()

This way, variables used by that code don't leak to module-level variables.

For example, this code prints 6 instead of refusing to compile because val
is mistyped inside foo:

def foo(value):
    return val + 1

if __name__ == '__main__':
    val = 5
    print(foo(val * 2))

        toplevel = subprocess.check_output(["git", "rev-parse", "--show-toplevel"])
        toplevel = toplevel.decode()[:-1]  # get rid of tailing \n

I'd use universal_newlines instead of decode, and rstrip() for cleanup:

        toplevel = subprocess.check_output(["git", "rev-parse", "--show-toplevel"], universal_newlines=True).rstrip()

Similarly for the following subprocess invocation.


    if not args.files:
        print("WARNING: no file to process :(" 
              " You may want to add files to operate on. See --help for further information.")

You probably want sys.exit or parser.error instead of print, so it goes to
stderr and the program terminates with an error status (and with
parser.error, you also get command line help).


    e = None

This doesn't seem to be used, I guess it can be deleted.


                    print("Issues with {fname}:\n\t{issues}".format(fname=fname, issues="\n\t".join(issues)))

It looks like an error message, which makes me think it could go to stderr, but
I don't know what are the required expectations on the output of the script.


    except FileNotFoundError as err:
        if err.filename in ("i18nspector", "msgcat"):
            sys.exit("{fname}: command not found\nYou need to install {fname} first. See /contribute/l10n_tricks." 
                     .format(fname=err.filename))
        else:
            raise

I'd rather use shutil.which before starting processing, and error out if it
returns None. That way the check happens in a predictable place, and before
potential side effects the main code of the script can have.

Then that whole try/catch can be removed.


    def regexKey(self, key: str) -> Pattern:
        """returns a regex to match a key: value pair in po header""" 
        return re.compile(r'^\s*"{key}\s*:\s*(?P<value>(.*\n)*?.*)\\\\n"\s*?$'.format(key=key), flags=re.M)

Please use key=re.escape(key). After adding 'r' to the regexp string, I
think there's now one backslash too much when matching the newline.

I find it really hard to review that regular expression, which worries me from
a security standpoint. Instead of a hand-crafted parser, I would suggest using
an existing library to parse the files.

Polib seems good enough: https://polib.readthedocs.io/en/latest/api.html#polib.POFile
and there is also babel: http://babel.pocoo.org/en/latest/api/messages/pofile.html
both seem solid, widely used, and packaged in Debian since various releases.
Also, i18nspector already depends on python3-polib :)

For example, polib would give you all the metadata as a convenient dict:

from polib import pofile
pf = pofile("file.po")
print(repr(pf.metadata))

The rest of the code looks like it could be greatly simplified by something
like that, and I'd rather stop at suggesting polib than continue double
checking the use of the regexps.


class PoFile:
    ...

    def __init__(self, fname: str) -> None:
        self.fname = fname

    def __enter__(self) -> 'PoFile':
        """magic method for with statement. @returns self so it can be used with "as" """ 
        self.open()
        return self

    def __exit__(self, type, value, traceback) -> None:
        """magic method for with statement""" 
        if type is None and value is None and traceback is None:
            self.write()

I don't like that __exit__ always calls write, even if write only acts when self.__changed is True. I would prefer something like this:

@contextlib.contextmanager
def pofile_readonly(fname):
   pf = PoFile(fname)
   pf.open()
   yield pf

@contextlib.contextmanager
def pofile_writable(fname):
   pf = PoFile(fname)
   pf.open()
   try:
      yield pf
   else:
      pf.write()

so that if one wants to open a file read only, there is no chance that write
gets called.

Also, contextmanager makes it very straightforward to have context managers,
allowing to remove __enter__ / __exit__ methods from PoFile.


    def lang(self) -> str:
        """@returns: language of filename""" 
        exts = self.fname.split(".")
        if len(exts) < 3:
            raise Exception("po file should have a language in his name.", self.fname)
        return exts[-2]

This would fail if any of the parent directories of the potfile contained a dot
in its name.

I suggest exts = os.path.basename(self.fname).split("."), and maybe a regexp
to be run on the basename instead of a split, to match exactly *.[lang].po.

I've seen potfiles named just with the language code, like de.po: if they are
supposed to be supported, that function would raise an exception on them.


    def write(self) -> None:
        """write file, if content was changed""" 
        if self.__changed:
            with open(self.fname, 'w') as f:
                f.write(self.content)

Given that it's writing over the source file, I'd be rather a fan of atomic
writes, like this:

def write(filename, content):
    with tempfile.NamedTemporaryFile(prefix=filename, delete=False) as fd:
        try:
            fd.write(content)
            fd.flush()
            os.fdatasync(fd.fileno())
        except Exception:
            os.unlink(fd.name)
            raise
        else:
            os.rename(fd.name, filename)

so that if anything goes wrong during the write, at least we don't leave the
user with a truncated source file.


    def msgcat(self, modify=False) -> bool:
        ...

If this is only needed for word-wrapping, polib does it automatically.


    def i18nspector(self) -> List[str]:
        """@returns a list of issues raised by i18nspector removes allowed issues from @I18NINSPECTOR_ALLOWED_ISSUES.
        """ 

s/I18NINSPECTOR_ALLOWED_ISSUES/I18NSPECTOR_ACCEPT/ ?


This is all I can think of right now. I think a custom parser for .po files is a really hard thing to make secure. I'm happy to do another round of review if you manage to port it to polib, and I expect the code will be much easier to
check :)

#57 Updated by intrigeri 8 months ago

  • QA Check changed from Ready for QA to Dev Needed

#58 Updated by hefee 7 months ago

  • Assignee changed from hefee to enrico
  • QA Check changed from Dev Needed to Ready for QA
  • Feature Branch changed from https://0xacab.org/Hefee/tails/tree/translation_platform_hooks to https://salsa.debian.org/tails-team/tails/merge_requests/14

Thanks for your review. I included all things you suggested. And I added some tests to test the base functionality.
Meanwhile the Foundation Team started to use salsa to do reviews:

https://salsa.debian.org/tails-team/tails/merge_requests/14

I think it makes it a lot easier for you to comment directly on a line base. If it does not work for you you can do a review like before.

@intrigeri, @u: the check for wrap the file to 79 chars is not stable. Different tools have a different idea what it means to wrap to 79 chars ;D IMO remove this check completely.

#59 Updated by intrigeri 7 months ago

intrigeri, @u: the check for wrap the file to 79 chars is not stable. Different tools have a different idea what it means to wrap to 79 chars ;D IMO remove this check completely.

@hefee: sure, this will make the code consistent with the conclusion you reached on #15408; and I'm all for dropping checks we don't really need, as we decided in January (#16328 :)

#60 Updated by enrico 7 months ago

  • Assignee changed from enrico to hefee

Cool! I made a few annotations there, mostly nitpicks.

#61 Updated by hefee 6 months ago

  • Assignee changed from hefee to enrico

Okay I think this is the final round of review.

#62 Updated by CyrilBrulebois 6 months ago

  • Target version changed from Tails_3.14 to Tails_3.15

#63 Updated by intrigeri 6 months ago

  • Status changed from In Progress to Needs Validation

#64 Updated by enrico 6 months ago

  • Assignee changed from enrico to hefee

I'm having some difficulty following the flow of progress between redmine and gitlab. Also, it looks like I made comments on the merge request but forgot to notify them here, sorry about that.

Today I am having problem manipulating notes on salsa: I get 404 on the AJAX posts my browser does when adding or updating a note/discussion. I'll revert to commenting here.

The discussion that ends with me saying "Ah, shame, fair enoguh" can be resolved.

I'm giving a quick look at all the files involved.

In PoFile.msgcat():
- f.flush() does not seem needed, since the file is opened read-only. Should it be seek(0), to rewind the file before passing it to subprocess?
- instead of passing stdin=f to subprocess, pass input=content, or pass the file name to msgcat's command line if msgcat would use it to print nicer error messages
- PoFile.msgcat() does not change self.pf: looking at unify_po_headers, it seems that it is called to do one last round of postprocessing of the po file with msgcat, but the msgcat output does not seem to be stored
- if my comments above are valid, I would try to check for msgcat effects in unit tests

The rest looks good to me.

#65 Updated by hefee 6 months ago

  • Assignee changed from hefee to enrico

enrico wrote:

I'm having some difficulty following the flow of progress between redmine and gitlab. Also, it looks like I made comments on the merge request but forgot to notify them here, sorry about that.

that is totally fine. As we want the review process to be moved to Gitlab. And this should not end up in copying all comments between Redmine and Gitlab.

Today I am having problem manipulating notes on salsa: I get 404 on the AJAX posts my browser does when adding or updating a note/discussion. I'll revert to commenting here.

The discussion that ends with me saying "Ah, shame, fair enoguh" can be resolved.

marked as resolved.

I'm giving a quick look at all the files involved.

In PoFile.msgcat():
- f.flush() does not seem needed, since the file is opened read-only. Should it be seek(0), to rewind the file before passing it to subprocess?
- instead of passing stdin=f to subprocess, pass input=content, or pass the file name to msgcat's command line if msgcat would use it to print nicer error messages

=> Good ideas, but I've rewrote the msgcat method to see the new needs_rewrap method to use polib only for the rewrap test.

- PoFile.msgcat() does not change self.pf: looking at unify_po_headers, it seems that it is called to do one last round of postprocessing of the po file with msgcat, but the msgcat output does not seem to be stored

It is not obvious indeed, but the needs_rewrap method updates the changed flag, if file is not wrapped properly and this triggers PoFile.save(), when existing the contextmanager.

- if my comments above are valid, I would try to check for msgcat effects in unit tests

I added new tests for the needs_rewrap method directly. But we also had the testfile length in checkPo/unifyPo folder, this file is the testcase, for only rewrap is needed.

The rest looks good to me.

cool ;D

#66 Updated by intrigeri 5 months ago

  • Status changed from Needs Validation to In Progress

#67 Updated by intrigeri 5 months ago

  • Status changed from In Progress to Resolved

#68 Updated by intrigeri 5 months ago

  • Assignee deleted (enrico)

Also available in: Atom PDF