Essay-Classification-with-N-Gram-LM

Project for Natural Language Processing Course (COMS 4705) at Columbia University's School of Engineering and Applied Science, Sept 2022

In this project, I built a trigram language model in Python. The main component of the language model is implemented in the class TrigramModel. One important idea behind implementing language models is that the probability distributions are not precomputed. Instead, the model only stores the raw counts of n-gram occurrences and then computes the probabilities on demand. This makes smoothing possible. The two datasets I worked with are available within the hw1_data zip file.

Project Parts:

Extracting n-grams from a sentence
Counting n-grams in a corpus
Raw n-gram probabilities
Smoothed probabilities
Computing Sentence Probability
Perplexity
Using the Model for Text Classification

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Essay-Classification-with-N-Gram-LM

Files

README.md

Latest commit

History

README.md

File metadata and controls

Essay-Classification-with-N-Gram-LM