This repository contains code for running TartuNLP's modular multilingual machine translation models. This application is based on a custom version of FairSeq.The models can be run locally to translate plain text files or as worker that process requests from RabbitMQ. The workers are compatible with our translation API.
The project is developed by the NLP research group at the University of Tartu. Neural machine translation can also be tested in our web demo.
The models can be downloaded from our HuggingFace, compatible models are marked with the modularNMT tag. Older models and compatible code versions can be found from the releases section.
The default config for running these models is specified in the included config/config.yaml
. For example, the first
model configuration corresponds to the following models/
folder structure:
models/
└── septilang
├── modular_model.pt
├── dict.de.txt
├── dict.en.txt
├── dict.et.txt
├── dict.fi.txt
├── dict.lt.txt
├── dict.lv.txt
├── dict.ru.txt
├── sp-model.de.model
├── sp-model.en.model
├── sp-model.et.model
├── sp-model.fi.model
├── sp-model.lt.model
├── sp-model.lv.model
└── sp-model.ru.model
Docker images for the workers are published alongside this repository.
Each image version correlates to a specific release. The required model file(s) are excluded from the image to reduce
the image size and should be attached at /app/models
.
Logging configuration is loaded from /app/config/logging.prod.ini
and service configuration from
the /app/config/config.yaml
file. The included config is commented to illustrate how new model configurations could be
added.
The following environment variables should be configured when running the container:
MQ_USERNAME
- RabbitMQ usernameMQ_PASSWORD
- RabbitMQ user passwordMQ_HOST
- RabbitMQ hostMQ_PORT
(optional) - RabbitMQ port (5672
by default)MQ_EXCHANGE
(optional) - RabbitMQ exchange name (translation
by default)MQ_HEARTBEAT
(optional) - heartbeat interval (60
seconds by default)MQ_CONNECTION_NAME
(optional) - friendly connection name (Translation worker
by default)MKL_NUM_THREADS
(optional) - number of threads used for intra-op parallelism by PyTorch.16
by default. If set to a blank value, it defaults to the number of CPU cores which may cause computational overhead when deployed on larger nodes. Alternatively, thedocker run
flag--cpuset-cpus
can be used to control this. For more details, refer to the performance and hardware requirements section below.
By default, the container entrypoint is main.py
without additional arguments, but arguments should be defined with the
COMMAND
option. The only required flag is --model-name
to select which model is loaded by the worker. The full list
of supported flags can be seen by running python main.py -h
:
usage: main.py [-h] --model-name MODEL_NAME [--model-config MODEL_CONFIG] [--log-config LOG_CONFIG] [--input-file INPUT_FILE] [--output-file OUTPUT_FILE] [--input-lang INPUT_LANG] [--output-lang OUTPUT_LANG]
A neural machine translation engine. This application supports two modes of operation:
a) a worker that processes incoming translation requests via RabbitMQ;
b) translation of a text file into a new file;
optional arguments:
-h, --help show this help message and exit
--model-name MODEL_NAME
The model to load. Refers to the model name in the config file. (default: None)
--model-config MODEL_CONFIG
The model config YAML file to load. (default: config/config.yaml)
--log-config LOG_CONFIG
Path to log config file. (default: config/logging.prod.ini)
Local file translation arguments, if the following arguments exist, local file translation is started. Otherwise a RabbitMQ worker is started:
--input-file INPUT_FILE
Path to the input text file. (default: None)
--output-file OUTPUT_FILE
Path to the output text file. (default: None)
--input-lang INPUT_LANG
Input language code (default: None)
--output-lang OUTPUT_LANG
Output language code. (default: None)
The setup can be tested with the following sample docker-compose.yml
configuration:
version: '3'
services:
rabbitmq:
image: 'rabbitmq'
environment:
- RABBITMQ_DEFAULT_USER=${RABBITMQ_USER}
- RABBITMQ_DEFAULT_PASS=${RABBITMQ_PASS}
nmt_api:
image: ghcr.io/tartunlp/translation-api:latest
environment:
- MQ_HOST=rabbitmq
- MQ_PORT=5672
- MQ_USERNAME=${RABBITMQ_USER}
- MQ_PASSWORD=${RABBITMQ_PASS}
ports:
- '80:8000'
depends_on:
- rabbitmq
nmt_worker:
image: ghcr.io/tartunlp/translation-worker:latest
environment:
- MQ_HOST=rabbitmq
- MQ_PORT=5672
- MQ_USERNAME=${RABBITMQ_USER}
- MQ_PASSWORD=${RABBITMQ_PASS}
- MKL_NUM_THREADS=8
volumes:
- ./models:/app/models
command: ["--model-name", "septilang"]
depends_on:
- rabbitmq
The following steps have been tested on Ubuntu. The code is both CPU and GPU compatible (CUDA is required).
-
Clone this repository
-
Install prerequisites:
-
GNU Compiler Collection (
sudo apt install build-essential
) -
For a CPU installation we recommend using the included
requirements.txt
file in a clean environment (tested with Python 3.10)pip install -r requirements.txt
-
For a GPU installation, use the
environment.yml
file instead.-
Make sure you have the following prerequisites installed:
-
Then create and activate a Conda environment with all dependencies:
conda env create -f environment.yml -n nmt conda activate nmt python -c "import nltk; nltk.download(\"punkt\")"
-
-
-
Download the models from the HuggingFace and place inside the
models/
directory -
Check the configuration files and change any defaults as needed. Make sure that the paths in
config/config.yaml
points to the model files you just downloaded. -
Specify RabbitMQ connection parameters with environment variables or in a
config/.env
file as illustrated in theconfig/sample.env
.
Run the worker with the following command where $MODEL_NAME
matches the worker name in your config file:
python main.py --model-name $MODEL_NAME [--log-config config/logging.ini --model-config config/config.yaml]
Or run local file translation with the following command:
python main.py --model-name $MODEL_NAME --input-file input.txt --output-file output.txt --input-lang est --output-lang eng [--log-config config/logging.ini --model-config config/config.yaml]
When running the model on a GPU, the exact RAM usage depends on the model and should always be tested, but a conservative estimate is to have 8 GB of memory available.
The CPU performance depends on the available CPU resources, however, this should be fine-tuned for the deployment
infrastructure. By default, PyTorch will try to utilize all CPU cores to 100% and run as many threads as there are
cores. This can cause major computational overhead if the worker is deployed on large nodes. The number of threads
used should be limited using the MKL_NUM_THREADS
environment variable or the docker run
flag --cpuset-cpus
.
Limiting CPU usage by docker configuration which only limits CPU shares is not sufficient (e.g. docker run
flag
--cpus
or the CPU limit in K8s, unless the non-default
static CPU Manager policy is used). For
example, on a node with 128 cores, setting the CPU limit at 16.0
results in 128 parallel threads running with each one
utilizing only 1/8 of each core's computational potential. This amplifies the effect of multithreading overhead and can
result in inference speeds up to 20x slower than expected.
Although the optimal number of threads depends on the exact model and infrastructure used, a good starting point is
around 16
(the default in the included docker image). With optimal configuration and modern hardware, the worker
should be able to process ~7 sentences per second. For more information, please refer to
PyTorch documentation.