Skip to content

API reference: Main V0.5a

Jetic Gu edited this page Jun 25, 2017 · 3 revisions

Introduction

This is the API reference to the Aligner main programme (src/aligner.py).

The description here is of main version 0.5a.

Changes (Comparing to 0.4a)

  • support for models v0.4a

  • added options to load and save trained models.

  • support for the new Dataset Data format, removed old bitext and tritext

Options

Run

> python aligner.py -h

To see all options.

Config file

A sample config file is provided in src\sample_config_file.ini.

The purpose of a config file is to provide information regarding specific testing and training data, instead of having to type all the options on the console.

The config file is divided into 3 sections: General, TrainData, and TestData.

[General]
DataDirectory = ~/Data/
TargetLanguageSuffix = cn
SourceLanguageSuffix = en

[TrainData]
TextFilePrefix = train
TagFilePrefix = train.tags
AlignmentFileSuffix = wa

[TestData]
TextFilePrefix = test
TagFilePrefix = test.tags
Reference = FULLPATHTOFILE.WA

The aligner will search for files that matches the prefix and suffix given above in the DataDirectory. Please note that currently Reference has to be the full path.

Dataset formats

The descriptions of file formats supported by this version are here.

Saved model files

Saved model files are of .pkl and .pklz formats, with the latter being the compressed version of the former which is smaller in size but usually takes longer to save and load.

Please note that when loading saved files, the model will check the file's modelName (and version if applicable, see API reference for Alignment Models for more detail) to prevent accidentally loading a file for a different model(or unsupported version of current model).

Individual modules