Preparation to move `hxltmcli`, `hxltmdexml`, `ontologia/cor.hxltm.yml` and documentation at https://hdp.etica.ai/hxltm to exclusive repository

- Label HXLTM: 
  - https://github.com/EticaAI/HXL-Data-Science-file-formats/issues?q=label%3AHXLTM
- _Test case for HXLM-Action: datasets from Translation Initiative for COVID-19 "TICO-19"_ (https://github.com/fititnt/hxltm-action/issues/5)
- Some existing related repositóries
  - https://github.com/EticaAI/tico-19-hxltm
  - https://github.com/fititnt/hxltm-action
- Cited test case
  - https://tico-19.github.io/
  - https://github.com/tico-19/tico-19.github.io

----
The _EticaAI/HXL-Data-Science-file-formats_ already is some sort of _monorepo_ (see https://en.wikipedia.org/wiki/Monorepo) but if the recent simplifications to make require less dependencies already did not made it better to divide, by trying to apply to more real test cases, like the _Translation Initiative for COVID-19_ (and assuming any other initiative would have much less people with information technology background, so TICO-19 actually is some of the _best case scenarios_) I myself believe that the HXLTM, even if some improvements to make more friendly to deal with bilingual files, should at least be much more documented.

Note that in general, bilingual is the supposed to be one of the easier cases (HXLTM focus on multilingual by default). But the way people submitted translations to TICO-19 (as translation pairs) make this type of optimization need.

### Beyond just "software documentation"
**One of the early challenges on the TICO-19 conversion actually is not even file conversion**. Obviously, since is SO MANY LANGUAGUES, the merge back, like described here https://github.com/fititnt/hxltm-action/issues/5#issuecomment-964544517, start to get very repetitive.

Maybe even document how users could drop files on some folder (maybe even with drivers to fetch from Google Drive or other averange user preference, so they would not need know git or something).

#### The language codes problem

The way different providers use to explain what the terms of a language are is not consistent. And this break hard any automation. Assuming average big providers would follow _IETF BCP 47 language tag_ as per specification is too optimistic, so if they read how to use the `hxltmcli` /`hxltmdexml` and the ontologia, is reasonable to assume we will have to give a crash course on other standards.

#### About minimum standards on how to collect terminology

I will not talk a lot of this on this issue, but even more critical than the decision of language codes be something that really means what someone could submit to some more global initiative, one of the main challenges still how the translations are collected. So, if we create a dedicated place that explains how to use the data convention, and (even without create dedicated "best practices") give intentional nudges on how to cope with anti-patterns on terminology translations, this could give a hint that the quality of translations is heavily based on how well documented is the bootstrapping material.

##### Potential example approach

Maybe we even intentionally create some specialized tagging subtag for "the case of source translation is not good enough as source term" it be be used as source term when exporting formats intended for receive translations back, like XLIFF. This fix two points:

- The first one, is anyone can hotfix translations before generate a new XLIFF, without publicly say that the source term was bad, yet without hurt translations
  - This also could be used in case of source language term have copyright.
- The second one is tolerate translations from terms that become some sort of standard and cannot be changed because would break software.

Please note that we already have ways to add more description to terms, but if the users don't use that, we could still allow this tricky on documentation.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Preparation to move `hxltmcli`, `hxltmdexml`, `ontologia/cor.hxltm.yml` and documentation at https://hdp.etica.ai/hxltm to exclusive repository #25

Beyond just "software documentation"

The language codes problem

About minimum standards on how to collect terminology

Potential example approach

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Preparation to move hxltmcli, hxltmdexml, ontologia/cor.hxltm.yml and documentation at https://hdp.etica.ai/hxltm to exclusive repository #25

Description

Beyond just "software documentation"

The language codes problem

About minimum standards on how to collect terminology

Potential example approach

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Preparation to move `hxltmcli`, `hxltmdexml`, `ontologia/cor.hxltm.yml` and documentation at https://hdp.etica.ai/hxltm to exclusive repository #25