This project implements a language model for word sequences with n-grams using Laplace or Knesey-Ney smoothing.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
- Create a directory in the Assignment/ and add all the data files
- Install the requirements.txt
- run the app.py (for modeling cross validation and testing) or predictive_keyboard.py for the predictive keyboard functionality.
In order to run the code in your local environment, please make sure your have python 3. and above and to have installed the needed python libraries. To install the libraries please run on your console:
pip install -r requirements.txt file
In order to train the language model you will need to run the following command:
python Assignment1/app.py
In order to run the predictive keyboard you will need to give the following command:
python Assignment1/predictive_keyboard.py
The project consists of the following main classes:
This class is responsible for all the handling and fetching of the dataset(s). It loads the data, splits them into sub parts according to user needs and performs folding for the cross validation process.
This class is responsible for the text pre-processing of the data. It consists methods for sentences splitting, tokenization and n-gram creation.
This class is responsible for fitting the language model into the given data. It calculates the probabilities of the language model, performs smoothing (available implementations: Laplace or Kneser-Ney smoothing algorithms), runs linear interpolation on n-gram probabilities and predicts the next word for a given sequence.
This class is responsible for the evaluation of a given model. It calculates the cross-entropy and perplexity of the model.