Skip to content

API Reference

Jetic Gu edited this page Jun 25, 2017 · 27 revisions

Introduction

On this page one can find the API references of the latest version of master branch.

API reference(master) V0.4a

API references of all versions of master branch are also available below:

API references of latest version on our other branches, including develop branch are available below. Please note that due to the fact that this project is still in development, APIs of versions below may change and it may take time for the API references to be updated accordingly.

  • V0.5a feat/dataset

  • V0.5d feat/export-model: The API reference for this branch will not be released separately, as it is included in V0.5a. It's difference with master branch is that it added the function of loading and saving trained models, which can be checked out by > python aligner.py -h. The model of this branch is of model API version 0.4a, which is available here.

Explanations of version numbers

Each individual module of the HMM aligner has its now API versions, which is not related to the internal code but only to the Interfaces. It is normal for different modules that are working together to have different individual API versions. For supported versions or module version dependencies please refer to the individual API reference Wiki pages.

API versions that will not be documented due to the fact that they are early working stages of a different branch (versions ending with a) which is documented as development goes on, will be of the following format: N.Nd, where Ns are numbers.

API versions of purely experimental branches that are not likely to be merged directly into master branch will be of the following format: N.Nc, where Ns are numbers.

Also, starting from v0.5a there will be supports of saving and loading trained models, in which case different models might have their own individual "Exporting version". Models can only load from supported versions of saved files. To make it easier to differentiate, this version is stored as model.version and is of the following format: N.Nb, where Ns are numbers.

Current Version (V0.4a)

Changes (Comparing to 0.3a)

  • support for models v0.3a

Options

Run

> python aligner.py -h

To see all options.

Config file

A sample config file is provided in src\sample_config_file.ini.

The purpose of a config file is to provide information regarding specific testing and training data, instead of having to type all the options on the console.

The config file is divided into 3 sections: General, TrainData, and TestData.

[General]
DataDirectory = ~/Data/
TargetLanguageSuffix = cn
SourceLanguageSuffix = en

[TrainData]
TextFilePrefix = train
TagFilePrefix = train.tags
AlignmentFileSuffix = wa

[TestData]
TextFilePrefix = test
TagFilePrefix = test.tags
Reference = FULLPATHTOFILE.WA

The aligner will search for files that matches the prefix and suffix given above in the DataDirectory. Please note that currently Reference has to be the full path.

Dataset formats

The descriptions of file formats supported by this version are here.

Individual modules