Essay-Classification-with-N-Gram-LM

Project for Natural Language Processing Course (COMS 4705) at Columbia University's School of Engineering and Applied Science, Sept 2022

In this project, I built a trigram language model in Python. The main component of the language model is implemented in the class TrigramModel. One important idea behind implementing language models is that the probability distributions are not precomputed. Instead, the model only stores the raw counts of n-gram occurrences and then computes the probabilities on demand. This makes smoothing possible. The two datasets I worked with are available within the hw1_data zip file.

Project Parts:

Extracting n-grams from a sentence
Counting n-grams in a corpus
Raw n-gram probabilities
Smoothed probabilities
Computing Sentence Probability
Perplexity
Using the Model for Text Classification

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
hw1_data (3).zip		hw1_data (3).zip
nlp_hw1 (1).py		nlp_hw1 (1).py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Essay-Classification-with-N-Gram-LM

About

Releases

Packages

Languages

alicediakova/Essay-Classification-with-N-Gram-LM

Folders and files

Latest commit

History

Repository files navigation

Essay-Classification-with-N-Gram-LM

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages