From 9272ca3d4e701692046d0a6d1f99d130f2226f30 Mon Sep 17 00:00:00 2001 From: Anders Lorentsen Date: Fri, 15 Nov 2024 13:36:01 +0100 Subject: [PATCH] Rewrite guide on updating dictionaries and FSTs --- dicts/nds/NDSUpdatingDictionaries.md | 148 ++++++++++++--------------- 1 file changed, 66 insertions(+), 82 deletions(-) diff --git a/dicts/nds/NDSUpdatingDictionaries.md b/dicts/nds/NDSUpdatingDictionaries.md index ff9d39c..f1dc9c1 100644 --- a/dicts/nds/NDSUpdatingDictionaries.md +++ b/dicts/nds/NDSUpdatingDictionaries.md @@ -1,18 +1,15 @@ # Updating dictionaries -This page documents both lexicon and fst updating, and restarting of the server. -One may update either lexica or fst or both, but in both cases configuring and resetting of the server must be run. +This page documents updating lexicon and fsts on the server. -All dictionaries are on gtdict, and require logging in as the _neahtta_ user. The compile process is restricted, so that only the lexicon will be able to be compiled, but _not_ the FST files. FSTs must be compiled manually (see below in _Updating the FSTs_). -## Updating the lexica on gtdict +## Updating the lexica (dictionaries) on gtdict For the impatient: **The short version:** ``` ssh gtdict.uit.no sudo su neahtta - cd neahtta nds update DICT nds compile DICT nds restart DICT @@ -20,128 +17,115 @@ For the impatient: **The short version:** **The longer explanation:** -1.) _Log in to the server via SSH_ +1.) _Log in to the server, and become the neahtta user_ -Log in to gtdict, and thereafter do `sudo su neahtta` +``` + ssh gtdict.uit.no + sudo su neahtta +``` -Note that when logged in as the NDS user, the python virtualenv should be activated automatically, and you will see this before the command prompt: +The python virtual environment should be activated automatically, this is +indicated by the starting `(venv)` in your prompt. You will also be placed +in the `~/neahttadigisanit/neahtta` folder. So, your prompt should look like: ``` - (venv)[neahtta@gtdict ~]$ +(venv) neahtta@gtdict-02:~/neahttadigisanit/neahtta$ ``` -(If you do not see this, do the following commands from the home directory of neahtta: _cd ~ && source venv/bin/activate_.) +If this does not happen, then run + +``` +cd /home/neahtta/neahttadigisanit/neahtta +. venv/bin/activate +``` When you see (venv) in the command prompt, continue. -2.) _Go to the neahtta catalogue and run the nds_commands process_ +2.) _Run the update command_ ``` - cd ~/neahtta/ nds update DICT ``` -Replace DICT below with sanit, baakoeh, etc. (to _nds compile sanit_ etc.) - -If you have problems here, make sure that the environment variables for _GTHOME_, and _GTCORE_ are set, however the _neahtta_ user should automatically be configured properly. Either you will see errors, or you can check with `echo $GTHOME`. The _neahtta_ user has these set automatically in its bash profile. - -3.) _Check that there were no errors_ - -This check is now a part of the `compile` command. You may also do `wc -l dicts/*.xml` to make sure there is content in the files. - -If there is an error in an XML file used in compilation, the compile script will give an error. Before compilation, a backup file will be stored, so if the compilation process overwrites this with a blank file, you may revert to a previous version. Backup files are named \*.bak, and include a timestamp. +Replace `DICT` with `sanit`, `baakoeh`, etc. -This process compiles all dictionaries to _dicts/_, which is the place that most instances of NDS rely on, following the relevant configuration file in _configs/DICT.config.yaml_. This will usually be enough, but if updates do not seem to be visible on the web, it is a good idea to check that the dictionaries are in the locations that the config expects, and alternatively restarting the server process. +The output will tell you what happened, and which dictionaries were updated. -**NB:** The files checked in to Git are different from those actually used in production on the server, this is to prevent accidental overwritings via _git push_. Thus, you will need to edit and check in _configs/DICT.config.yaml.in_, which is fine for use in development work, but the servers instances will be running from _confgis/DICT.config.yaml_. +**READ THIS (TEMPORARY ERRORS)**: As of November 2024, The `nds update DICT` +command *DOES NOT WORK PROPERLY*! You are going to have to go to each +individual `~/gut/giellalt/dict-xxx-yyy` folder, and run `git pull` manually. +To make sure everything is correct, also run `git status` in the dictionaries +after pulling, to see that the git status is in order. If anything is out of +place, for example, it may say that a _rebase_ is in progress, then steps +will have to be taken to get the git status back to normal. -4.) _Testing the configuration files_ +Hint: Use `nds ls -d DICT` to see which `dict-xxx-yyy` are a part of that +dictionary instance. -Simply (re)start the instance. If it fails to start, it will print out -information about what went wrong. The instance will not start unless everything -in the configuration file is in order. If an instance is meant to run without -an fst, for example, then simply comment-out the line specifying the fst in -the config file, and run again. - -There is a command that can also run these checks, and print out information. +3.) _Run the compile command_ ``` - nds test-configuration DICT + nds compile DICT ``` -Running it will evaluate the config, test dictionaries, and then print FST -paths and last updated date. If an FST is missing from its expected path, -it will be listed as MISSING. If you see any errors at the end of the process, -or worse, Python errors, something is wrong and you should avoid restarting -until this is corrected. +The output will show you which dictionaries got compiled, and how many entries +(`` nodes) it found in total. You may also look at the compiled +`dicts/xxx-yyy.xml` files directly to see that they look like expected. For +example, that they contains roughly as many lines as all the corresponding +`dict-xxx-yyy/src/*.xml` files combined. +If there is an error in an XML file used in compilation, the compile script +will give an error, and the existing compiled `dicts/xxx-yyy.xml` file will +NOT be written to. -5.) _Restart the server process_ +Notice that if you run the compile command again, then no dictionaries will +be compiled, because the `nds` script detects that the already compiled +dictionary is newer than the sources. You can use `nds compile DICT -f` to force +recompilation, for example to reset `sme-nob` to not have stem information. -When everything is working, run the following: +4.) _Restart the instance_ ``` nds restart DICT ``` -## Updating the FSTs - -There are two ways to update the FSTs. For both of these options, you must know first where the FSTs for each dictionary and language should lie. FST locations are defined in the relevant config file in _configs/DICT.config.yaml_, in the _Morphology_ section near the top. (Note the difference mentioned above between _.yaml.in_ and _.yaml_. +If it fails to start, it will print out information about what went wrong. +The instance will not start unless everything in the configuration file is in +order. If an instance is meant to run without an fst, for example, then simply +comment-out the line specifying the fst in the config file, and run again. -As above, you can use the test command to see if the files were updated. +There is a command that can also run these checks, and print out information. ``` - NDS test-configuration DICT + nds test-configuration DICT ``` -If you see any errors, be sure to correct them. +Running it will evaluate the config, test dictionaries, and then print FST +paths and last updated date. If an FST is missing from its expected path, +it will be listed as MISSING. If you see any errors at the end of the process, +or worse, Python errors, something is wrong and you should avoid restarting +until this is corrected. -### Updating on your own -The only current way to update FSTs is to do so on your own, using whichevermethod you are comfortable with, typically following the usual procedure for _$GTLANGS_, and then copying them manually to the specified locations. +## Updating the FSTs -To find the FST locations: +As of November 2024, all FSTs running on the server are the ones from +apertium nightly. They are updated through the operating system's usual update +mechanics, namely: ``` - nds test-configuration DICT + sudo apt-get update && sudo apt-get upgrade ``` -This will output the following: +You can see the paths to all FSTs used in all dictionaries, by running: ``` - [...snip...] - - SoMe: - FOUND: /opt/smi/sme/bin/analyser-dict-gt-desc-mobile.xfst - UPDATED: Tue Nov 4 15:47:31 2014 - - FOUND: /opt/smi/sme/bin/generator-dict-gt-norm.xfst - UPDATED: Tue Nov 4 15:47:31 2014 - - sme: - FOUND: /opt/smi/sme/bin/analyser-dict-gt-desc.xfst - UPDATED: Tue Nov 4 15:47:31 2014 - - - FOUND: /opt/smi/sme/bin/generator-dict-gt-norm.xfst - UPDATED: Tue Nov 4 15:47:31 2014 - - [... snip ...] + rg "file:" neahtta/configs/*.config.yaml ``` -When you compile the analyzers on your own, copy them to these paths, and test that their permissions allow them to be accessible to the neahtta user. - -### Updating via script - -Updating via script has not been implemented in the newest nds_commands script, as this was not used in recent years. An automatic system for updating FSTs is on the wish list. +You will see all FST files are located in the apertium nightly folder, namely +`/usr/share/giella/LANG/FILE`. -## Resetting the server - -Either use the nds process, or relevant system commands. - -``` - cd ~/neahtta/ - nds restart DICT -``` +If, in the future, some dictionary uses an FST that is not from apertium +nightly, then of course that will have to be updated manually. -**NB:** you may be prompted for the neahtta sudo password, and if this doesn't work, something is broken and developers must fix it.