Homophone Corrector

Approach:

The classification approach used involves determining context of a possibly incorrect word by using its surrounding words and their POS tags. The features used for training are the two words to the left and their POS tags and two words to the right and their POS tags. The NLTK Maximum Entropy classifier was used for training the model. Out of the varous Maxent algorithms MEGAM gave the best results in terms of performance and accuracy. The model was trained using Brown Corpus, Abc Corpus and 100MB Wikipedia dumps.

Running the Code:

[1.] Training Data Model

- python homophonetrainer.py (will generate model file 'model.pickle')

[2.] Classifying Input Data:

- python homophonecorrector.py < hw3.test.err.txt > hw3.output.txt

Third party software used:

List of 3rd party software used

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Homophone Corrector

Approach:

Running the Code:

Third party software used:

About

Releases

Packages

Languages

gitgraghu/homophone-corrector

Folders and files

Latest commit

History

Repository files navigation

Homophone Corrector

Approach:

Running the Code:

Third party software used:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages