Skip to content

Adding a new language

Trang edited this page Apr 17, 2022 · 11 revisions

Here we describe the steps to add a new language in Tatoeba from a development point of view.

1. Add the language in LanguagesLib

This is needed to make the language available in the various dropdown lists.

  • Open the file src/Lib/LanguagesLib.php
  • Add the language in the in languagesInTatoeba().
  • If the language has an ISO 639-1 code, also add it in get_Iso639_3_To_Iso639_1_Map().
  • If the language is a right to left language, also add it in getLanguageDirection($lang).

2. Add the icon

  • Language icons are located in webroot/img/flags.
  • The icon needs to be a SVG file, named with the ISO 639-3 language code (ex: ita.svg).

In case there is no SVG available to use for the icon, you can create a SVG out of a PNG as followed:

  • Compress the PNG icon (https://compresspng.com/).
  • Convert it into data URI (https://ezgif.com/image-to-datauri).
  • Download the SVG template.
  • Open the template in a text editor (such as Notepad++).
  • In the template, replace {dataURI} by the string that you got from converting the PNG to data URI.

3. Run the MySQL add_new_language() procedure

This is needed for the statistics page that lists all the languages and the number of sentences in each language. It's also needed to display the sentences count on some other pages.

  • The script that creates the procedure add_new_language() can be found in docs/database/procedures/add_new_language.sql. If your database does not have this procedure, then run this script first.
  • Connect to the database and execute the procedure: CALL add_new_language('<lang>', <listId>);. Replace <lang> with the ISO 639-3 code and <listId> with the id of the list that contains the sentences in that language.

4. Update SphinxConfShell.php

In some cases, the new language will use a script that is not yet handled by the search engine. The new characters need to be defined in the SphinxConfShell.php.

  • Open the /src/Shell/SphinxConfShell.php file.
  • Add the Unicode block
    • in the $charsetTable table if the language has spaces to separate words (like in English)
    • in the $scriptsWithoutWordBoundaries table if the language has no spaces (like in Chinese)

Example with Dhivehi: https://github.com/Tatoeba/tatoeba2/pull/1855#issuecomment-480477645

Deployment on production

You can ignore this step for local development if you don't have Sphinx installed.

Follow the instructions on the Adding a new language section of the Deployment page.