Skip to content

I18n Proposition

Hon Yi Hao edited this page May 14, 2025 · 5 revisions

Translator

The i18n directory contains the code for the translator.

Workflow

For each language, there will be a dictionary file containing translation for the terms in the textbook under the directory ai_files/{locale_code}. The translator iterates through ai_files to know what languages to translate to. These files together with the English XMLs will be fed into the translator, which outputs translated XMLs that will be stored into the branch called translated_xmls under this repo.

When the repo is deployed, besides generating docs_out for the English XMLs, the translated XMLs will be cloned from translated_xmls and will also run through the scripts generating docs_out.

.github/workflows/translate-changed.yml, .github/workflows/translate-everything.yml are responsible for calling the translator, while .github/workflows/deploy-pages.yml is responsible for the deployment.

Following is the complete process of the workflow:

  1. Check whether there are changes of the English XMLs. (using tj-actions/changed-files)
  2. If there are, the translator will be triggered to translate the changed English XMLs into other languages and store the output in i18n/translation_output, which will be pushed to translated_xmls using peaceiris/actions-gh-pages@v4.
  3. Run the generating scripts on the English XMLs.
  4. Clone translated_xmls, and run the generating scripts for each language.
  5. Deploy docs_out to https://sicp.sourceacademy.org

The Front End

In the frontend repo, a selection menu for the language of the textbook will be implemented.

Translator CLI

Use npx tsx index.ts or yarn trans to invoke the translator. In the following part, <trans> refers to either of the two.

  1. <trans> test <section> <lang>: translate (e.g. 1.1, 1.1.2) to (e.g. zh_CN), used for debugging
  2. <trans> abs: "abs" stands for absent. translate every section to every language if the translation does not exist or the modification time is older than the English XML itself, used for local testing.
  3. <trans>, <trans> all: translate every section to every language
  4. <trans> <path1> <path2> ...: translate the English XMLs at the paths <path1>,<path2>,... to every language.

Future Directions

  1. Abstract all translation instructions, system prompts and dictionaries to individual files for greater customisability and easy updating. This could allow end users to create pull requests to edit system prompts and enhance the resulted translations. These files will likely live in a folder, within which the folder structures and file names are identical to the xml/en folder.
  2. Modify and improve javascript/index.js parsing logics:
    • properly handle the search index generation. Currently, one single search index is generated for all languages.
    • implement parsing for other file types aside from json. Only json parsing is implemented for i18n for now.
    • check extensively unintended consequences of changes made to index.js that allows parsing of different languages.
  3. Modify the parsing logic from xml to json to allow for partial translations (i.e. only selected files for a single language has to be present. Currently, the parsing logic requires all files to be present)
  4. Create an API for frontend to read available languages for each section. Update frontend to handle gracefully when a language option does not exist when selecting next/previous section.
  5. Add ability in frontend to request for specific languages
  6. Improve Terminal UI to track translation progress of individual files.
  7. In production, XML parsers are created in non-strict mode (strict: false). Consider doing development in strict mode to better spot possible issues in chunking logic and AI responses.
Clone this wiki locally