Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Avoid losing existing translations when the i18n key was changed #7435

Open
stokito opened this issue Nov 30, 2024 · 17 comments
Open

[CI] Avoid losing existing translations when the i18n key was changed #7435

stokito opened this issue Nov 30, 2024 · 17 comments

Comments

@stokito
Copy link
Contributor

stokito commented Nov 30, 2024

How can we otherwise help?

Translations are good even when they bad. But we have periodically a situation when some i18n key i.e. the English text was changed and the old translation doesn't work anymore.
For example if the _('ACLs') was changed to _('ACL').
We may keep the old translations which now will be not exactly accurate but it's better than nothing. The Weblate will show such new but translated strings as "Needs for a review" and translators can easily fix working.

Usually after running of the i18n-sync.sh such orphaned keys are commented out and moved to the bottom of po file. I trying to restore such translations but if the key was re-translated in the Weblate we end up with a conflict. So it was spent a lot of time of translators and devs.

We need some solution to avoid such i18n keys breaking. This may be a simple CI job that will run the i18n-sync.sh and check for the orphaned keys. We may require that all the orphaned keys were renamed or removed before merging.

@dannil
Copy link
Contributor

dannil commented Nov 30, 2024

I think this is one part of a bigger issue on the "integrity" of the source texts that was brought up in #7175.

@systemcrash systemcrash changed the title [CI] Avoid loosing of existing translations when the i18n key was changed [CI] Avoid losing existing translations when the i18n key was changed Dec 1, 2024
@systemcrash
Copy link
Contributor

This doesn't happen often enough, nor is it a problem that is not resolvable. I don't really like CI adding its own commits to repair something. But running the i18n scripts and flagging a failure if a PR touches po files and its results don't match is possible. It's a bit of work... can you make it work @stokito ?

  • a new PR touching .po(t) files triggers CI process
  • pull the luci repo
  • run the i18n sync scripts
  • if its results differ, and there are outstanding 'changes' - warn

This will require turning off line numbers and filenames in the po files to reduce churn or false-positives due to code only changes.

@stokito
Copy link
Contributor Author

stokito commented Dec 1, 2024

Interesting how other projects handle the situation. I'll try to back on this later, please don't close the issue for now.

@systemcrash
Copy link
Contributor

I've come up with some changes for the build scripts to omit line number comments like:
#: applications/luci-app-ddns/htdocs/luci-static/resources/view/ddns/overview.js:432 by default. This avoids churn for code-only changes.

One advantage is that we see an average reduction in po file size by about 25%.

Just means a big commit with lots of reduction. All ~3500 files.

Next problem: CI. In my tests, running e.g. msgmerge on a github workflow, it randomly decides to amend juuuust a few messages which only change the line length, wrap, but git will mark this as modified. The CI runner runs identical commands to here. So the only thing I can think of is disparity in gettext tools versions. And... they differ 🤦 ( mine: 0.22.5, CI: 0.21 ).

I don't want a CI runner making commits and starting merge wars with weblate. 🫤

@hnyman
Copy link
Contributor

hnyman commented Dec 1, 2024

I've come up with some changes for the build scripts to omit line number comments like:
#: applications/luci-app-ddns/htdocs/luci-static/resources/view/ddns/overview.js:432 by default. This avoids churn for code-only changes.

Last time when the idea of dropping line numbers from .po files was discussed, @jow- wanted to keep the line numbers there.

#5655
#7089

@systemcrash
Copy link
Contributor

Thanks for the reminder. I was sure we discussed it.

The changes to scripts allow adding line numbers back in, which should be done in the local repo. Translators anyway check out the repo to do i18n work.

So line numbers in the main repo can go.

@stokito
Copy link
Contributor Author

stokito commented Dec 1, 2024

I expected we can simply update po file and see if some keys were orphaned i.e. they not exists in the pot and now commented. So this can be a simple grep for a comment.
The generation of a po file can be made to temp folder without affecting of the sources.

But having a pot regenerated and commited by the CI would be probably nice. We have a small PRs where there wasn't i18n sync done. By regeneating the pot we can keep the line numbers synced but also any new strings will be added and translated in the Weblate.
The line numbers itself is a nice feature but probably we may have them only in the pot. Similarly we may not add the unttanslated keys to the po files. The Weblate probably will be fine with this. But this is slightly out of the context of the issue

@systemcrash
Copy link
Contributor

Next problem: CI. In my tests, running e.g. msgmerge on a github workflow, it randomly decides to amend juuuust a few messages which only change the line length, wrap, but git will mark this as modified. The CI runner runs identical commands to here. So the only thing I can think of is disparity in gettext tools versions. And... they differ 🤦 ( mine: 0.22.5, CI: 0.21 ).

OK. Nailed it. Apparently gettext utilities take some liberties if you're outputting to a new file, but update mode seems to discard minor differences which concern only line-wrap. So we can now trigger i18n-sync on every PR. Provided everything else in the repo is refreshed, there won't be any differences. But it outputs which po files differ, so it's easy to identify whether that PR is to blame.

@hnyman
Copy link
Contributor

hnyman commented Dec 2, 2024

Similarly we may not add the unttanslated keys to the po files. The Weblate probably will be fine with this. But this is slightly out of the context of the issue

Actually, weblate needs that we add untranslated strings to each .po file of each language. Otherwise weblate does not find the new string in a language.

@systemcrash
Copy link
Contributor

Similarly we may not add the unttanslated keys to the po files. The Weblate probably will be fine with this. But this is slightly out of the context of the issue

Actually, weblate needs that we add untranslated strings to each .po file of each language. Otherwise weblate does not find the new string in a language.

Yeah, otherwise how else do you signal "hey, this text is untranslated" to a translator to translate it? :)

@systemcrash
Copy link
Contributor

OK - I'll push those script changes together with a "line-numbers removed" commit, and a github workflow. Let's try that out for a while and see how it goes. If it proves problematic, default flags in two sync scripts (sh+pl) can be changed (documented in scripts and commit message) and the github workflow disabled by either renaming the yaml file or running a gh command.

The i18n_check runs in about 30 seconds, outputs a summary, with success like:

Run git status --porcelain
  git status --porcelain
  git diff
  git status --porcelain | grep -q '^ M ' && echo "::error::Unsynchronized changes detected!" && exit 1 || echo "i18n is up-to-date." && exit 0
  shell: /usr/bin/bash -e {0}
  
i18n is up-to-date.

Or if something is not synched up, it flags a failure:

 M applications/luci-app-adblock-fast/po/lt/adblock-fast.po
 M applications/luci-app-adblock/po/fr/adblock.po
 M applications/luci-app-advanced-reboot/po/fr/advanced-reboot.po
 M applications/luci-app-aria2/po/fr/aria2.po
 M applications/luci-app-aria2/po/lt/aria2.po
 M applications/luci-app-hd-idle/po/pt/hd-idle.po
 M applications/luci-app-hd-idle/po/pt_BR/hd-idle.po
 M applications/luci-app-olsr/po/pt/olsr.po
 M applications/luci-app-radicale/po/lt/radicale.po
 M applications/luci-app-radicale/po/tr/radicale.po
 M applications/luci-app-sqm/po/lt/sqm.po
 M applications/luci-app-unbound/po/lt/unbound.po
 M modules/luci-base/po/fr/base.po
 M modules/luci-base/po/hu/base.po
diff --git a/applications/luci-app-adblock-fast/po/lt/adblock-fast.po b/applications/luci-app-adblock-fast/po/lt/adblock-fast.po
index be26176..e4d2635 100644
--- a/applications/luci-app-adblock-fast/po/lt/adblock-fast.po
+++ b/applications/luci-app-adblock-fast/po/lt/adblock-fast.po
@@ -374,8 +374,8 @@ msgstr ""
 
 msgid "Pick the LED not already used in %sSystem LED Configuration%s."
 msgstr ""
-"Pasirinkti „LED“ (lemputės/-ę), kuri nėra naudojama %sSistemos „LED“ "
-"(Lemputės) konfigūracijoje%s."
+"Pasirinkti „LED“ (lemputės/-ę), kuri nėra naudojama %sSistemos "
+"„LED“ (Lemputės) konfigūracijoje%s."
 
 msgid "Pick the SmartDNS instance(s) for AdBlocking"
 msgstr ""
+"„tcMTU“ (baitas); turi būti >= sąsaja ir/arba sietuvas – „MTU“ + „overhead“"
...

Error: Unsynchronized changes detected!
Error: Process completed with exit code 1.

@hnyman
Copy link
Contributor

hnyman commented Dec 2, 2024

What is the actual goal here?
To get more people to sync translations along PR?
That will likely just cause more weblate conflicts. :-(

@systemcrash
Copy link
Contributor

It's a CI check - to flag PRs that introduce (or are victim to) unsynchronised i18n changes.

I don't think it'll cause weblate conflicts - it's no different to today if contributors don't always sync their changes. Weblate just continues doing its thing based on what's in master branch. What did you have in mind?

i18n string modification in general does need to be handled correctly to avoid a merge conflict. So the feedback loop from weblate might 'invalidate' PRs that sit open for a while, but this reinforces the need to sync i18n strings.

In any case, I'm open to try the workflow out and see how things go.

@hnyman
Copy link
Contributor

hnyman commented Dec 2, 2024

Just that.
In regard to avoiding conflicts, it would be better to let old strings get stale and get invalid in weblate, so that new translations are needed.

Weblate is pretty picky if PRs with .po changes are merged when the same .po files with edits from weblate are already in weblate PRs. Like the upnp debacle this week. Two conflicts inside 2 days.

(Safest is to always merge the open weblate PR before any other PRs with .po changes are merged.)

@systemcrash
Copy link
Contributor

(Safest is to always merge the open weblate PR before any other PRs with .po changes are merged.)

Exactly that. And maintainers here need to remember to merge weblate first 😅

In regard to avoiding conflicts, it would be better to let old strings get stale and get invalid in weblate, so that new translations are needed.

Agreed. (barring a few exceptions for paragraphs of text)

Two things currently outstanding.

  • Teaching the CI about only the content of the PR and not the entire repo (to limit the scope of i18n-sync).
  • How it works for backports (when @hnyman cherry-picks i18n to e.g. 23)

@hnyman
Copy link
Contributor

hnyman commented Dec 3, 2024

Backports to old releases is actually pretty easy , as the script only backports strings that are exactly same in that branch as in master.

I have also noticed during the years that it is best to first sync, then backport updates, and then sync again. That was necessary a few years ago. Not sure if it is really needed now, but what I still do in LuCI root is:

Switched to branch 'openwrt-23.05'
...

./build/i18n-sync.sh ; ./build/i18n-merge-master.pl ; ./build/i18n-sync.sh 

So, it is definitely not git cherry-pick which would leave to errors (like we have suffered earlier with .po cherry-picks by somebody)

@jow-
Copy link
Contributor

jow- commented Dec 4, 2024

Keep the line numbers, they're required for tools like poedit to show the proper source context while translating files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants