Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preparation to move hxltmcli, hxltmdexml, ontologia/cor.hxltm.yml and documentation at https://hdp.etica.ai/hxltm to exclusive repository #25

Open
fititnt opened this issue Nov 12, 2021 · 1 comment
Labels
HXLTM https://hdp.etica.ai/hxltm

Comments

@fititnt
Copy link
Member

fititnt commented Nov 12, 2021


The EticaAI/HXL-Data-Science-file-formats already is some sort of monorepo (see https://en.wikipedia.org/wiki/Monorepo) but if the recent simplifications to make require less dependencies already did not made it better to divide, by trying to apply to more real test cases, like the Translation Initiative for COVID-19 (and assuming any other initiative would have much less people with information technology background, so TICO-19 actually is some of the best case scenarios) I myself believe that the HXLTM, even if some improvements to make more friendly to deal with bilingual files, should at least be much more documented.

Note that in general, bilingual is the supposed to be one of the easier cases (HXLTM focus on multilingual by default). But the way people submitted translations to TICO-19 (as translation pairs) make this type of optimization need.

Beyond just "software documentation"

One of the early challenges on the TICO-19 conversion actually is not even file conversion. Obviously, since is SO MANY LANGUAGUES, the merge back, like described here fititnt/hxltm-action#5 (comment), start to get very repetitive.

Maybe even document how users could drop files on some folder (maybe even with drivers to fetch from Google Drive or other averange user preference, so they would not need know git or something).

The language codes problem

The way different providers use to explain what the terms of a language are is not consistent. And this break hard any automation. Assuming average big providers would follow IETF BCP 47 language tag as per specification is too optimistic, so if they read how to use the hxltmcli /hxltmdexml and the ontologia, is reasonable to assume we will have to give a crash course on other standards.

About minimum standards on how to collect terminology

I will not talk a lot of this on this issue, but even more critical than the decision of language codes be something that really means what someone could submit to some more global initiative, one of the main challenges still how the translations are collected. So, if we create a dedicated place that explains how to use the data convention, and (even without create dedicated "best practices") give intentional nudges on how to cope with anti-patterns on terminology translations, this could give a hint that the quality of translations is heavily based on how well documented is the bootstrapping material.

Potential example approach

Maybe we even intentionally create some specialized tagging subtag for "the case of source translation is not good enough as source term" it be be used as source term when exporting formats intended for receive translations back, like XLIFF. This fix two points:

  • The first one, is anyone can hotfix translations before generate a new XLIFF, without publicly say that the source term was bad, yet without hurt translations
    • This also could be used in case of source language term have copyright.
  • The second one is tolerate translations from terms that become some sort of standard and cannot be changed because would break software.

Please note that we already have ways to add more description to terms, but if the users don't use that, we could still allow this tricky on documentation.

@fititnt fititnt added the HXLTM https://hdp.etica.ai/hxltm label Nov 12, 2021
@fititnt
Copy link
Member Author

fititnt commented Nov 13, 2021

With exception of this one, all issues labeled as HXLTM were moved to https://github.com/EticaAI/hxltm.


Captura de tela de 2021-11-13 20-44-15


Captura de tela de 2021-11-13 20-49-00


Captura de tela de 2021-11-13 20-59-27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HXLTM https://hdp.etica.ai/hxltm
Projects
None yet
Development

No branches or pull requests

1 participant