Skip to content

Latest commit

 

History

History
32 lines (23 loc) · 1.07 KB

README.mkd

File metadata and controls

32 lines (23 loc) · 1.07 KB

English to Japanese Transliteration Model

This is the project repository for a Japanese-English Transliteration model done as a final project for Professor Robert Berwick's 6.863 course in 2009.

Dependencies

  • Python
  • NLTK 0.9.8 or 0.9.8

Note: NLTK 0.9.9, which was the current version when this project was written, contained a bug that prevented it from working.

If you want to re-generate the dictionary, you'll need XCode and a Mac running OSX as well.

Running the Code

The dictionary file is already prebuilt (data/dictionary.txt) containing 4-tuples of English, Katakana, Japanese Phonemes, and English Phonemes.

The main file to run is aligner/hmm.py. Inspect the main() method of hmm.py to see the different ways you can run the script.

Running the version provided will result in a cross-validation test using 2-Grams. (Note that the ngramorder variable you see in hmm.py is one less than the N of the N-Grams you are running with..)