Skip to content

HOWTO Add a New Locale

Mickaël Schoentgen edited this page Nov 8, 2023 · 3 revisions

Overall Process

  1. Copy an existing lang file from the lang folder. Remove all data from the old lang.
  2. Copy an existing test file from the tests folder. Remove all data from the old lang.
  3. Update lang/__init__.py accordingly.
  4. Add the locale code in scripts/all-namespaces.py and run python -m scripts
  5. Test it:
    python -m pytest tests/tests_$LOCALE.py
  6. When you think you are ready, fetch and convert all words:
    # Run the command that will fetch the data and convert it into dicthtml-$LOCALE.zip
    python -m wikidict $LOCALE

That's it! Thanks a lot for your contribution ❤️

When done, a maintainer will:

  • Create a new release with the tag $LOCALE. This is where the dictionary will be uploaded.
  • Update that README to include the new locale in the Dictionaries section. Keep it alphabetically sorted please, and use the original locale for the language name, not english.

Process in Details

Finding Sections

You first need to find the right head_sectionsand section_level.

Then:

python -m wikidict $LOCALE --find-templates

The file sections.txt is created.

Finding Templates

When sections are set, you can now find templates:

python -m wikidict $LOCALE --find-templates

The file templates.txt is created.

Adding Tests

Have a look at sections.txt first, and then at templates.txt. When you find a new section or template, add a test (you can have a look at existant tests).

You can also get the definition quickly for a word and see the formatting:

python -m wikidict $LOCALE --get-word "word" [--raw]

Finalizing

Run that script:

# File: parenthesis.py
import json
import re
import sys


with open(sys.argv[1]) as fh:
    words = json.load(fh)

seen = set()
pattern = re.compile(r"(\([A-Z]+[^\)]+\))")
for word, definitions in words.items():
    for definition in definitions[-1]:
        if isinstance(definition, str):
            for m in pattern.findall(definition):
                if m not in seen:
                    print(m, repr(word))
                    seen.add(m)
        else:
            for subdef in definition:
                for m in pattern.findall(subdef):
                    if m not in seen:
                        print(m, repr(word))
                        seen.add(m)

Use it like:

python parenthesis.py data/$LOCALE/data.json

It will output all words in parenthesis (most of them are templates), just check that nothing seems weird: else it will mean that you have another template to handle :)