diff --git a/nlp/abbreviation_spellchecker_english.ipynb b/nlp/abbreviation_spellchecker_english.ipynb index 715800f..cfdb2d4 100644 --- a/nlp/abbreviation_spellchecker_english.ipynb +++ b/nlp/abbreviation_spellchecker_english.ipynb @@ -8,7 +8,7 @@ "\n", "Recently I bumped into a [question](https://stackoverflow.com/questions/43510778) on Stackoverflow, how to recover phrases from abbreviations, e.g. turn *wtrbtl* into *water bottle*, and *bsktball* into *basketball*. The question had an additional complication: lack of comprehensive list of word. That means, we need an algorithm able to invent new likely words.\n", "\n", - "I was intrigued, and started researching, which algorithms and math lie behind modern spell-checkers. It turned out that a good spell-checker can be made with a n-gram language model, a model of word distortions, and a greedy beam search algorithm. The whole construction is called a "noisy channel model" (see the "Spelling Correction and the Noisy Channel" section in http://web.stanford.edu/~jurafsky/slp3 for more details). \n", + "I was intrigued, and started researching, which algorithms and math lie behind modern spell-checkers. It turned out that a good spell-checker can be made with a n-gram language model, a model of word distortions, and a greedy beam search algorithm. The whole construction is called a 'noisy channel model' (see the 'Spelling Correction and the Noisy Channel' section in http://web.stanford.edu/~jurafsky/slp3 for more details). \n", "\n", "With this knowledge and Python, I wrote a model from scratch. After training on \"The Fellowship of the Ring\" text, it was able to recognize abbreviations of modern sports terms." ]