Skip to content

Adding a new language

Yorwba edited this page Oct 23, 2022 · 11 revisions

Here we describe the steps to add a new language in Tatoeba from a development point of view.

1. Update LanguagesLib.php

  • Open the file src/Lib/LanguagesLib.php
  • Add the language in the $languages array in the function languagesInTatoeba().
  • If the language has an ISO 639-1 code, add it in the $map array in the function get_Iso639_3_To_Iso639_1_Map().
  • If the language only has one writing system and is written right to left, add the language code in the $rightToLeftLangs array in the function getLanguageDirection($lang).
  • If the language has more than one writing system and can be written either right to left or left to right, add the language code in the $autoLangs array in the function getLanguageDirection($lang).

2. Add the language icon

  • Language icons are located in webroot/img/flags.
  • The icon needs to be a SVG file, named with the ISO 639-3 language code (ex: ita.svg).

If a SVG file is provided in the language request:

  • Download the file, open it with SVG Cleaner and download the cleaned result.
  • Rename the file according to the ISO 639-3 code and put it in the webroot/img/flags folder.
  • Optionally, if you are familiar enough with SVG, feel free to see if you can optimize the code to make the file size as small as possible.

If only a PNG file is provided in the language request, you will need to create a SVG out of this PNG, as followed:

  • Compress the PNG icon (https://compresspng.com/).
  • Convert it into data URI (https://ezgif.com/image-to-datauri).
  • Download the SVG template.
  • Open the template in a text editor (such as Notepad++).
  • In the template, replace {dataURI} by the string that you got from converting the PNG to data URI.
  • Save the changes, name the file accordingly and put it in the webroot/img/flags folder.

3. Update SphinxConfShell.php

In some cases, the new language will use a script that is not yet handled by the search engine. The new characters need to be defined in the SphinxConfShell.php file, otherwise the sentences in the new language cannot be indexed and searched.

  • Open the /src/Shell/SphinxConfShell.php file.
  • Add the Unicode block
    • in the $charsetTable table if the language has spaces to separate words (like in English)
    • in the $scriptsWithoutWordBoundaries table if the language has no spaces (like in Chinese)

Example with Dhivehi: https://github.com/Tatoeba/tatoeba2/pull/1855#issuecomment-480477645

4. Update the database and Manticore

This step is only useful if you have Tatoeba installed locally and want to test the changes that you did in the previous steps.

For this, you can follow the instructions found in the Adding a new language section of the Deployment page. This page describes how to add the language on production, but it is the same steps in local environment if you have installed TatoVM.