Preparation to move hxltmcli
, hxltmdexml
, ontologia/cor.hxltm.yml
and documentation at https://hdp.etica.ai/hxltm to exclusive repository
#25
Labels
HXLTM
https://hdp.etica.ai/hxltm
The EticaAI/HXL-Data-Science-file-formats already is some sort of monorepo (see https://en.wikipedia.org/wiki/Monorepo) but if the recent simplifications to make require less dependencies already did not made it better to divide, by trying to apply to more real test cases, like the Translation Initiative for COVID-19 (and assuming any other initiative would have much less people with information technology background, so TICO-19 actually is some of the best case scenarios) I myself believe that the HXLTM, even if some improvements to make more friendly to deal with bilingual files, should at least be much more documented.
Note that in general, bilingual is the supposed to be one of the easier cases (HXLTM focus on multilingual by default). But the way people submitted translations to TICO-19 (as translation pairs) make this type of optimization need.
Beyond just "software documentation"
One of the early challenges on the TICO-19 conversion actually is not even file conversion. Obviously, since is SO MANY LANGUAGUES, the merge back, like described here fititnt/hxltm-action#5 (comment), start to get very repetitive.
Maybe even document how users could drop files on some folder (maybe even with drivers to fetch from Google Drive or other averange user preference, so they would not need know git or something).
The language codes problem
The way different providers use to explain what the terms of a language are is not consistent. And this break hard any automation. Assuming average big providers would follow IETF BCP 47 language tag as per specification is too optimistic, so if they read how to use the
hxltmcli
/hxltmdexml
and the ontologia, is reasonable to assume we will have to give a crash course on other standards.About minimum standards on how to collect terminology
I will not talk a lot of this on this issue, but even more critical than the decision of language codes be something that really means what someone could submit to some more global initiative, one of the main challenges still how the translations are collected. So, if we create a dedicated place that explains how to use the data convention, and (even without create dedicated "best practices") give intentional nudges on how to cope with anti-patterns on terminology translations, this could give a hint that the quality of translations is heavily based on how well documented is the bootstrapping material.
Potential example approach
Maybe we even intentionally create some specialized tagging subtag for "the case of source translation is not good enough as source term" it be be used as source term when exporting formats intended for receive translations back, like XLIFF. This fix two points:
Please note that we already have ways to add more description to terms, but if the users don't use that, we could still allow this tricky on documentation.
The text was updated successfully, but these errors were encountered: