Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't ignore localized strings that are the same as original #307

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

felipeborges
Copy link

Avoiding storing identical strings is a clever measure to keep the
appstream file small and clean, but it ignores translation differences
between languages and their specific locales.

Applications tend to "fallback" from missing locales by picking a
translation of the same language but from a different locale.
For instance, when missing a "pt_BR" translation, some apps will
pick the "pt" translation instead. That usually works but there
are some cases when it doesn't, such as for international words:
Brazilian Portuguese (pt_BR) tends to use them, while European
Portuguese (pt) has a translation for everything.

This way, "GNOME Boxes" gets translated to "Caixas GNOME" in
European Portuguese (pt) but the same "GNOME Boxes" name is
expected in Brazilian Portuguese (pt_BR).

This was initially reported as a Flatpak issue in
https://lists.freedesktop.org/archives/flatpak/2019-May/001578.html
Because OP was seeing the wrong translations in Flathub and
GNOME Software.

Avoiding storing identical strings is a clever measure to keep the
appstream file small and clean, but it ignores translation differences
between languages and their specific locales.

Applications tend to "fallback" from missing locales by picking a
translation of the same language but from a different locale.
For instance, when missing a "pt_BR" translation, some apps will
pick the "pt" translation instead. That usually works but there
are some cases when it doesn't, such as for international words:
Brazilian Portuguese (pt_BR) tends to use them, while European
Portuguese (pt) has a translation for everything.

This way, "GNOME Boxes" gets translated to "Caixas GNOME" in
European Portuguese (pt) but the same "GNOME Boxes" name is
expected in Brazilian Portuguese (pt_BR).

This was initially reported as a Flatpak issue in
https://lists.freedesktop.org/archives/flatpak/2019-May/001578.html
Because OP was seeing the wrong translations in Flathub and
GNOME Software.
@hughsie
Copy link
Owner

hughsie commented May 23, 2019

How much does this affect the size of something like the Fedora metadata? Certainly en_GB is 99.9% the same as C although I do concede that gzip should dedupe this effectively. @kalev what do you think?

@kalev
Copy link
Collaborator

kalev commented May 23, 2019

I'd say it's probably best to go for correctness, even if it increases the metadata size. Let's see how much bigger it gets and back this change out if it's unacceptably large?

@rffontenelle
Copy link
Contributor

How about keeping localized strings equal original only when the fallback language ('pt', in Felipe's example) translation is presented? If absent, there will be no fallback and no problem for other languages (e.g. pt_BR). This way, lines wouldn't be added without being required and file size may not grow too much.

@felipeborges
Copy link
Author

How about keeping localized strings equal original only when the fallback language ('pt', in Felipe's example) translation is presented? If absent, there will be no fallback and no problem for other languages (e.g. pt_BR). This way, lines wouldn't be added without being required and file size may not grow too much.

My understanding is that the fallback is done in the application's side. All consumers of appstream.xml will read from it (GNOME Software, Flathub, other-software-store-thingies) and they are the ones using this heuristic to fallback missing translations.

@hughsie
Copy link
Owner

hughsie commented May 24, 2019

and they are the ones using this heuristic to fallback missing translations

Old versions of gnome-software use appstream-glib to get the "best" translation, so it should be enough to fix it here for those older versions. For newer versions of gnome-software we parse it directly with libxmlb and so any fix would need to be copied there. I'm going to do some benchmarks compiling the entire fedora repo today with the dedepe functionality and without. It does take some time...

@hughsie
Copy link
Owner

hughsie commented May 24, 2019

I'm guessing this wasn't tested :) The XML file is the same size, even post decompression. I think this patch needs to drop AS_NODE_INSERT_FLAG_DEDUPE_LANG -- I'm just running the generator again with that dropped to see what it does to the XML file.

@rffontenelle
Copy link
Contributor

Hey there, is there anything blocking this PR? Now and then Brazilians ask about this issue.

@hughsie
Copy link
Owner

hughsie commented Dec 12, 2019

Hey there, is there anything blocking this PR?

Well, there's CI failure and the fact that the patch doesn't work, but other than that...

@@ -1747,10 +1747,6 @@ as_node_get_localized (const AsNode *node, const gchar *key)
if (g_strcmp0 (xml_lang, "x-test") == 0)
continue;

/* avoid storing identical strings */
data_localized = data->cdata;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the data_localized variable.

/* avoid storing identical strings */
data_localized = data->cdata;
if (xml_lang != NULL && g_strcmp0 (data_unlocalized, data_localized) == 0)
continue;
g_hash_table_insert (hash,
as_ref_string_ref (xml_lang != NULL ? xml_lang : xml_lang_c),
(gpointer) data_localized);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of data_localized, you need to use data->cdata

@igaldino
Copy link
Contributor

Hi @felipeborges, I tried to update your branch but I don't have rights, so I just commented it.
Changes should be enough to pass test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants