Convert to Maven for easier builds, and make ARPA reader robust #28
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
http://www1.icsi.berkeley.edu/Speech/docs/HTKBook3.2/node213_mn.html
specifies the ARPA LM format. In particular, there doesn't seem to be a requirement for tabs, and
in fact the CMU Sphinx LM files at http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/ don't use tabs to delimit the logprob and backoff weight from the ngram.
This change 1) converts the format of the project from Ant to Maven
and 2) rewrites nlp.lm.io/ArpaLmReader.java so that it's a bit shorter and reads all the files it used to, along with Sphinx (and I think, any other compliant ARPA) LM files.
the second more substantial change could probably be taken without taking the first build related one